The horrific factory collapse at Rana Plaza in Dhaka has brought the business practices of global garment brands, as well their thousands of suppliers, into the spotlight.
At School of Data we noted that corrupt and missing data were part of the story. Data on building permits in Bangladesh is largely unavailable due to lack of state inspections. However, after years of pressure on global apparel brands from labor activists, the publishing of garment factory supplier lists is becoming increasingly standardized. We’re asking you to join us in mapping the data on garment factories.
Data Expedition: Mapping the garment factories
When: Saturday May 25 – 12:00 BST to May 26 18:00 BST - link to your timezone
We’ll be looking for projects such as:
Mapping garment factories locally and globally
Exploring the global supply chain of garment export and imports
Please note that limited space is available. For more information about the Data Expedition format, we encourage you to read this article.
Before the Data Expedition – Help us build an open garment factory supply list
Before heading out on this important expedition, we’ll need to gather as much data as possible on garment factories. Labor activists and campaigners typically articulate the data in terms of ”supplier lists.” Some brands, such as Nike, provide a list of all factories in their supplier network via Excel and JSON downloads; while others, such as Levi-Strauss, only offer lists in PDF format. In order to prepare a solid dataset for the Data Expedition, we’re asking you to help locate, clean, and merge the supplier lists from across garment brands into one comprehensive Open Garment Factory List.
A well targeted Freedom of Information request to the UK Department for Education and its consequent report hit the news here recently. It turns out that a claim by Minister for Education, Michael Gove, that “[s]urvey after survey has revealed disturbing historical ignorance” was largely based, either directly, or as a cynic might imagine, as a result of feverish, opportunistic retrofitting exercise (i.e. a post hoc rationale), on a series of PR related, media sponsored polls.
various definitions relating to the role of data cleansing;
the replacement of metadata records or fields in library catalogues in particular that are tainted with commercial license restrictions with data of equivalent or higher quality, known provenance and open license terms;
a process in which the original low quality origins of a dataset is masked by the provenance, authority or veneer of quality associated with another trusted agent, such that the data becomes accepted “at face value” with the imprimateur of that trusted party. The ‘wash and rinse” process can be repeated to give the data set or claim ever more weight with each restatement. (For an example, see Sleight of Hand and Data Laundering in Evidence Based Policy Making).
It is this final sense, of giving a weak claim a weightier basis, that Gove in particular appears to have acted: whilst the “fact” that UK schoolchildren woefully demonstrate historical ignorance is now claimed to have been sourced from dubious PR commissioned polls, the fact that could have been remembered was that a Minister had stated that this was true with the full implied backing of his Department. And it would be this “fact” that that might then end up being repeated in ever more formal policy setting situations and actually helping drive the adoption of a particular policy. (A famous example of how one source of information is assumed to have a different provenance to what is actually the case is demonstrated by the “dodgy dossier” in which “large parts of [a] British government … dossier on Iraq – allegedly based on “intelligence material” – were taken from published academic articles, some of them several years old.” On the question of polls, it’s worth remembering that they are often commissioned, and reported, in the context of different lobbying aims, and may even framed to make one preferred outcome more likely than an unfavoured outcome: Two can play at that game: When polls collide.)
Note that this form of data misuse is different to the recent case of the Reinhart-Rogoff academic paper that was shown by a student replication project to include errors, frame the data in a particular way, and make a hardline point around an arbitrary threshold value. In the “R & R” case, academic evidence was used to support a particular policy decision, but then the evidence was found to contain errors that could be argued affected the validity of its claims, the very claims that supported the adoption of one particular policy over another.
As data finds its way in to ever more news reports and official reports, it may at times be worth treating it as “hearsay”, rather than demonstrated fact”, if you can’t get clear information about how, when and by whom the data was originally collected and analysed. In other words, you may at times need to follow the data (h/t Paul Bradshaw).
PS I am also reminded of the phrase zombie statistic to describe those numbers that get quoted in report after report, that never seem to die no matter how often they are contested, and whose provenance is obscured by the mists of time. The full extent of the relationship between zombie statistics and data laundering is left as a question for further research… or the comments below…;-)
Each year, the Federal Government spends over $100 billion on research. This investment, in part is used to gather new data. But all too often the new data gathered isn’t made publicly available and thus can’t generate maximum return on investment through later re-use by other researchers, policy-makers, clinicians and everyday taxpaying citizens.
A shining example of the value and legacy of research data is the Human Genome Project.
This project and its associated public research data are estimated to have generated $796 billion in economic impact, created 310,000 jobs, and launched a scientific revolution. All from an investment of just $3.8 billion.
With the budget sequestration of 2013 and onwards it’s vitally important to get maximum value for money on research spending. By ensuring public access to most Federally funded research data it’ll help researchers do more with less. If researchers have greater access to data that’s already been gathered they can focus more acutely on accumulating just the new data they need, and nothing more. It’s not uncommon for Federally funded researchers to perform duplicate research and gather duplicate data. The competitive and often secretive nature of research means that duplicative research and data hoarding are probably rife, but hard to evidence. Enforcing a public data policy on researchers would thus help them to make the overall system more efficient. This tallies with the conclusions of the JISC report (2011) on data centres:
“The most widely-agreed benefit of data centres is research efficiency. Data centres make research quicker, easier and cheaper, and ensure that work is not repeated unnecessarily.”
Another more subtle benefit of making Federal-funded data more public is that it would increase the overall importance and profile of US research in the world. Recent research by Piwowar & Vision (2013) robustly demonstrates that research that releases public data gets cited more than research that does not publicly release its underlying data.
The as yet untapped value of research data:
I believe most research data has immense untapped re-use value. We’re only just beginning to realise the value of data mining techniques on ‘Big Data’ and small data alike. In the 21st century, now more than ever, we have immensely powerful tools and techniques to make sense of the data deluge. The potential scientific and economic benefits of such text and data mining analyses are consistently rated very highly. The McKinsey Global Institute report on ‘Big Data’ (2011) estimated a $300 billion value on data mining US health care data alone.
I would finish by imploring you to read and implement the recommendations of the ‘Science as an Open Enterprise’ report from the Royal Society (2012):
Scientists need to be more open among themselves and with the public and media
Greater recognition needs to be given to the value of data gathering, analysis and communication
Common standards for sharing information are required to make it widely usable
Publishing data in a reusable form to support findings must be mandatory
More experts in managing and supporting the use of digital data are required
New software tools need to be developed to analyse the growing amount of data being gathered
Ross Mounce, Community Coordinator for Open Science, Open Knowledge Foundation
More than 170 years before Jean-François Champollion had the first real success in translating Egyptian hieroglyphs, the 17th century Jesuit scholar Athanasius Kircher was convinced he had cracked it. He was very wrong. Daniel Stolzenberg looks at Kircher’s Egyptian Oedipus, a book that has been called “one of the most learned monstrosities of all times.” In 1655, after more than two decades of toil, Athanasius Kircher published Egyptian Oedipus. With his title, the Jesuit scholar characteristically paid honor to himself. Like Oedipus answering the riddle of the Sphinx, Kircher believed he had solved the enigma of the hieroglyphs. Together with its companion volume, Pamphilian Obelisk, Kircher’s magnum opus presented Latin translations of hieroglyphic inscriptions — utterly mistaken, as post–Rosetta-Stone Egyptology would reveal — preceded by treatises on ancient Egyptian history, the origins of idolatry, allegorical and symbolic wisdom, and numerous non-Egyptian textual traditions that supposedly preserved elements of the “hieroglyphic doctrine.” In addition to ancient Greek and Latin authors, Kircher’s vast array of sources included texts in Oriental languages, including Hebrew, Arabic, Aramaic, Coptic, Samaritan, and Ethiopian, as well as archeological evidence. The resulting amalgam is, without doubt, impressive. But it can also bewilder. Egyptian Oedipus promised a complete “restoration [...]
In part this is about the provision of tools, such as our world-renowned CKAN open data portal, but it’s also about bringing people together who are passionate about making a change and giving them a space whether that’s online or face-to-face to wrangle open data, write code and take action together.
At the recent Open Interests hack participants developed a suite of apps that help us understand lobbying in the EU and how money is spent. A couple of weeks ago Open Data Maker Night in London people wrangled data from local authority websites to find out which companies receives the lion’s share of the Greater London’s Authorities resources. Across our various Working Group mailing lists people from all over the world are debating, sharing data and experimenting with code in a huge variety of domains from open science to open government data.
At bottom this is about bringing people with bright ideas coming together to collaborate around open content and open data to build things that have transformative potential.
The Open Humanities Hangout
Over the past few months a group of people interested in open culture, including myself, have been getting together on Google Hangout in order to build stuff with the vast amount of open cultural data and content that’s out there.
In the cultural sphere much of the transformative potential of open lies in widening access to our treasured cultural heritage whether that’s classic literary texts or the paintings of the great masters. But as ever it’s not only about opening up huge amounts of data and content, there’s already a hell of a lot of that already on the Internet Archive and Wikimedia Commons, this is also about empowering people to actually use this material in ways that they deem valuable.
So on the Open Humanities Hangout we’ve tried to do things that address both these challenges:
I want more people to join the Open Humanities hangouts – more Java Script coders, more designers, more literature students, more bloggers… anyone who loves the humanities and wants to see the great works of our past accessible and re-usable by everyone regardless of their background or location.
I’m putting forward a challenge for our next set of monthly Hangouts based on some of the great work some of the Open Humanities Working Group members have been doing around open correspondence data and open booking scanning.
I’m challenging the Open Humanities Hangout crew to construct a workflow that will enable *anyone to turn a published set of letters and turn it into a visualisation of a network of correspondence.*
One of the great success stories of the so-called Digital Humanities is the wonderful Mapping the Republic of Letters project, a collaboration between Stanford and Oxford Universities that visualises the networks of correspondence of early modern scholars. The beautiful and insightful visualisations that have been created in the process have captured the imaginations of technologists and humanists world wide.
I want to see a million Mapping the Republic of Letters project. I want it to be as easy as possible to map the correspondence of historial figures, so that anyone can do this. This includes the first year school students wanting some beautiful images for their coursework and the scholar who will use much richer data to give a more through, in-depth and academic visual story for a research paper.
I want the underlying tools to be open source and well documented and perhaps, most importantly, I want the underlying data, that collection of metadata about who sent what when to be open for everyone to use and add to.
This effort doesn’t require the existence of a huge repository of data about letters that we tap into (although this might merge in the process). This is about small sets of open data, sourced and formatted in appropriate ways by passionate groups of people all around the world that can be combined and connected easily using open source web-based components.
How do we begin?
To my eyes, this effort will involve the documentation of at least 4 steps:
Scan in a published collection of letters
Turn this scans intro structured data that contains relevant information on respondent, date, location
Geo-code all those locations
Visualise the results on a map
We’ve already made some progress on steps 1. – 2. and there’s a wealth of information already available on how to do your own scanning and OCRing including manuals on how to build your own scanner. For 3. – 4. there’s already some brilliant information over on the School of Data. However, I want to see this information synthesised into a single point — so any student, teacher or researcher can get all the information on how to go from that collected volume of letters of so-and-so on their shelf to a beautiful visualisation.
What might result if we’re successful?
Well for one, I hope that a beautiful and insightful set of visualisations might emerge about the correspondence of a number of important figures all over the web. But perhaps a longer term goal is to stimulate the creation of databases of correspondence that are open to everyone to use and add to. To begin with we’ll be constrained to the published volumes of correspondence in print, but if we get enough people contributing we can re-combine these published volumes in all sorts of interesting ways filling in gaps and ultimately creating datasets that might enable us to map whole networks of correspondence for a given period.
So the challenge is on. The next Open Humanities Hangout will take place at 5pm BST on Tuesday May 28th. If you’re thinking of joining ping me a quick message on firstname.lastname@example.org!
The Public Domain Remix is a contest organized by the Open Knowledge Foundation and Wikimedia France, which aims to give a new life to the public domain by encouraging the creative remix of works that are no longer protected by copyright law. The objective is to promote the public domain by showing what can actually be done with these works.
The competition aims to encourage the use and reuse of public domain works while promoting transmediality: Rather than maintaining the same medium, the public will be encouraged to move from one medium to another (eg, remixing a literary work into music, a photograph into sculpture, etc.). As such, the Public Domain Remix is divided into five categories: Arts, Literature, Music, Video and Hardware.
To celebrate the begining of the contest, a special event was organized during the OuiShare Festival, at the Cabaret Sauvage in Paris, on Saturday, May 4th 2013.
Several artists had been invited to present their work and explain their artistic approach around the notion of remix. These artists intervened as mediators between the works and the public, who was invited to remix the public domain, either by working individually or by contributing to the creation of a collaborative work. By means of specific workshops, each artist encouraged the public to remix these works in an innovative and creative way, while sharing their own skills and ideas, presenting the tools that can be used to remix certain types of works, and explaining to the public how to use these tools.
Literary workshop (Olivier Vilaspasa)
A collaborative workshop was organised to help people randomly make a prediction about the future on a particular issue. Taking content from the the book “Treaty of political economy” (1841) of the economist Jean-Baptiste, the public was invited to cut sentences into pieces to create a pool of subjects, verbs, adjectives and adverbs.
The audience could then ask a question (which was hidden) and the answer was given to them by randomly drawing out words from the pool. Each participant left with a cut & paste set of Questions and Answers arranged on a page specifically prepared for this prediction.
Technical workshop (Primavera De Filippi)
Materials were provided to the public (such as books, paintings and illustrations in the public domain, cassettes or CDs of songs which are in the public domain, videos, etc.) as well as tools (glue, scisors, pliers, hammers, screws, bolts, drills, etc.) to allow the public to remix the work.
The purpose of the workshop was to encourage the public to create new works using public domain works as raw material (in the true sense of term). Many collages were made, and several sculptures were created, stories have been illustrated with 3 dimensional characters, books have been turned into pirate boats … everything in a wonderful atmosphere of fantasy and chaos.
Poetic & musical workshop ( David Christoffel )
As a response to a reading of A discourse on method by Descartes, the public was invited to read aloud and record on the fly the words excerpts from a selection of texts in the public domain related to question of rethoric in speech. The set of readings, words and thoughts collected and produced by the public has then been remixed into music, giving rise to a sort of musical interchange with the public domain.
Musical Workshop (JL’z Team Factory)
Starting with a soundtrack recorded in 1914 (Favorite airs from The Mikado by Edison Light Opera Company), the public was invited to explore and select fragments thereof. These sound samples were then crushed and distorted with the functions proposed by the open-source software Audacity. They were then duplicated, re-ordered, stacked together or looped throughout the song, creating a new melody and harmony, a new rhythm giving a new life to the music.
The VJ workshop invited the public to work around the notion of contribution, development and self-empowerment, blurring the lines between taking and giving in a collective process, to reach a consensus between collective autonomy and individual self.
The goal of the workshop was to produce a series of audiovisual performances, to give new life to visual and sound archives, through a process of common-sense and self-expression: an experimental process of immediate exchange and intersected media (merging public contributions with public domain presentations) to create new performances in a single movement.
Arctic Gymnopédie by Les Dupont
If you have not been able to join us at this event, you can still participate to the contest until December 31st 2013 by sending pictures of your work on the following website: http://france.publicdomainremix.org. Prizes will be awarded to reward the best works in each of the five competition categories: visual arts, literature, music, video, and hardware.
The Open Knowledge Foundation will aim to organise more Public Domain Remix competitions in other countries and is looking for local partner organisations. Are you interested? Get in touch!
Last month, Paul David, professor of Economics at Stanford University, Senior Fellow of the Stanford Institute for Economic Policy Research (SIEPR) and a member of the Advisory Panel delivered a keynote presentation at the International Seminar of the PROPICE in Paris.
Professor David expresses concern that the increased use of intellectual property rights (IPR) protections “has posed problems for open collaborative scientific research” and that the IPR regime has been used by businesses e.g. to “raise commercial rivals’ costs”, where empirical evidence shows has shown that business innovation is “is being inhibited by patent thickets”.
In describing the anti-commons issue, professor David also pointed out that research databases are likely sites for problems and emphasised the importance of protecting the future open access to critical data.
Also, high quality data would be very costly, where “…strengthening researchers’ incentives to create transparent, fully documented and dynamically annotated datasets to be used by others remains an insufficiently addressed problem”.
Selected plates from How I killed the tiger; being an account of my encounter with a royal Bengal tiger, with an appendix containing some general information about India (1902), a small book by Lieutenant Colonel Frank Sheffield detailing his close brush with death by tiger. As the author explains in his introduction: My main purpose in writing this little book, was to place in a permanent form a description of my wonderful preservation from death in a chance encounter with a Royal Bengal Tiger. My life had been adventurous up to that time. I had shot big game of various kinds. But this episode, so marvellous in itself, so important in its influence upon my after life and character, marks the close of my career as a hunter of big game. Read the book, including more illustrative plates, over in our post in the Texts collection. (All images taken from the book housed at the Internet Archive, contributed by the University of Toronto Libraries). HELP TO KEEP US AFLOAT The Public Domain Review is a not-for-profit project and we rely on support from our readers to stay afloat. If you like what we do then please do consider making a [...]
How I killed the tiger being an account of my encounter with a royal Bengal tiger, with an appendix containing some general information about India; 1902; Smith’s Print.and Pub. Agency, London. How I killed the tiger; being an account of my encounter with a royal Bengal tiger, with an appendix containing some general information about India (1902) is a small book written by Lieutenant Colonel Frank Sheffield detailing his close brush with death by tiger. As the author explains in his introduction: My main purpose in writing this little book, was to place in a permanent form a description of my wonderful preservation from death in a chance encounter with a Royal Bengal Tiger. My life had been adventurous up to that time. I had shot big game of various kinds. But this episode, so marvellous in itself, so important in its influence upon my after life and character, marks the close of my career as a hunter of big game. See a selection of the book’s wonderful illustrative plates over in the post in our Images collection. The book is housed at the Internet Archive, contributed by the University of Toronto Libraries. HELP TO KEEP US AFLOAT The Public [...]