You are browsing the archive for Legal.

PersonalData.IO helps you get access to your personal data

- December 21, 2016 in Legal, Open Data, personal-data

PersonalData.IO is a free and open platform for citizens to track their personal data and understand how it is used by companies. It is part of the MyData movement, promoting a human-centric approach to personal data management. A lot of readers of this blog will be familiar with Freedom of Information laws, a legal mechanism that forces governments to be more open. Individuals, journalists, startups and other actors can use this “right-to-know” to understand what the government is doing and try to make it function better. There are even platforms that help facilitate the exercise of this right, like MuckRock, WhatDoTheyKnow or FragDenStaat. These platforms also have an education function around information rights. In Europe we enjoy a similar right with respect to personal data held by private companies, but it is often very hard to exercise it. We want to change that, with PersonalData.IO.

personal-dataImage credit: Kevin O’Connor (CC BY)

What is personal data?

In European law, the definition of personal data is extremely broad: any information relating to an identified or identifiable natural person. Unlike in the U.S., the concept of identifiability is crucial in defining personal data, and ever-expanding to match technical possibilities: if some intermediate identifier (license plate, cookie, dynamic IP address, phone number, etc) can reasonably be traced back to you given likely evolution of technology, all the data associated to that identifier becomes personal data.

Why should you care?

Holding personal data often translates into power over people, which in turn becomes economic might (at the extreme, think Facebook, Google, etc). This situation often creates uncomfortable issues of transborder transparency and accountability, but also hinders the appearance of other innovative uses for the data, for instance for research, art, business, education, advocacy, journalism, etc.


PersonalData.IO portal

Leveraging the same mechanisms as FOI portals, we are focused on making such requests easier to initiate, to follow through, to share and then to clone. Processing the requests in the open helps increase the pressure on companies to comply. In practice, we have taken the Froide software developed by Open Knowledge Germany, themed it to our needs and made some basic modifications in the workflow. Our platform is growing its user base slowly, but we benefit from many network effects: for any given company, you only need one person to go through the process of getting their hands on their data, and afterwards everyone benefits!



Getting to the data is only the first step. The bar is still pretty high then to make it really useful. In May 2018, new regulations will come into place in Europe to help individuals leverage their personal data even more: individuals will enjoy a new right to data portability, i.e. the right to transfer data from one service to another. In anticipation a whole movement has arisen focused on human-centric personal data management, called MyData. OpenKnowledge Finland recently organised a conference with tons of people building new services giving you more control over all that data! I am looking forward to a tool helping individuals turn their personal data into Open Data (by scraping direct identifiers, for instance). Many companies will also benefit from the Frictionless Data project, since there will be a requirement to transfer that data “in a structured, commonly used, machine-readable and interoperable format”.

pabloImage credit: Salla Thure (Public Domain)

In anticipation to this exciting ecosystem, we want with PersonalData.IO to build experience expanding access to this data and to encourage companies to broaden their view of what constitutes personal data. The more data is considered personal data, the more you will be in control. Feel free to join us! You can sign up to our mailing list or directly to the portal itself and initiate new requests. You can also follow us on Twitter or contact us directly. We welcome individual feedback and ideas and are always looking for new partners, developers and contributors!

UK Crime Data: Feeling is Believing

- July 1, 2015 in crimedata, Featured, Legal, Open Data, Open Government Data, Open Knowledge

Latest crime data shows that the UK is getting significantly more ‘peaceful’. Last month, the Institute for Economics and Peace published the UK Peace Index, revealing UK crime figures have fallen the most of all EU countries in the past decade. Homicide rates, to take one indicator, have halved over the last decade.
Crime Scene by Alan Cleaver, Flickr, CC-BY

Crime Scene by Alan Cleaver, Flickr, CC-BY

But the British public still feels that crime levels are rising. How can opening up crime data play a part in convincing us we are less likely to experience crime than ever before?

The ‘Perception Gap’

The discrepancy between crime data and perceptions of the likelihood of crime is particularly marked in the UK. Although it has been found that a majority of the public broadly trust official statistics, the figures are markedly lower for those relating to crime. In one study, 85% of people agreed that the Census accurately reflects changes in the UK, but only 63% said the same of crime statistics.

Credibility of Police Data

Police forces have been publishing crime statistics in the UK since 2008, using their own web-based crime mapping tools or via the national crime mapping facility ( and This has been purportedly for the purpose of improving engagement with local communities alongside other policy objectives, such as promoting transparency. But allegations of ‘figure fiddling’ on the part of the police have undermined the data’s credibility and in 2014, the UK Statistics Authority withdrew its gold-standard status from police figures, pointing to ‘accumulating evidence’ of unreliability. The UK’s open data site for crime figures allows users to download street-level crime and outcome data in CSV format and explore the API containing detailed crime data and information about individual police forces and neighbourhood teams. It also provides Custom CSV download and JSON API helper interfaces so you can more easily access subsets of the data. Crime map from But the credibility of the data has been called into question. Just recently, data relating to stop-search incidents for children aged under-12 was proved ‘inaccurate’. The site itself details many issues which call the accuracy of the data into question: inconsistent geocoding policies in police forces; “Six police forces we suspect may be double-reporting certain types of incidents“; ‘siloed systems’ within police records; and differing IT systems from regional force to force. In summary, we cannot be sure the ‘data provided is fully accurate or consistent.’

The Role the Media Plays: If it Bleeds, it Leads

In response to persistent and widespread public disbelief, the policies of successive UK governments on crime have toughened: much tougher sentencing, more people in prison, more police on the streets. When the British public were asked why they think there is more crime now than in the past, more than half (57%) stated that it was because of what they see on television and almost half (48%) said it was because of what they read in newspapers [Ipsos MORI poll on Closing the Gaps. One tabloid newspaper, exclaimed just recently: “Rape still at record levels and violent crime rises” and “Crime shows biggest rise for a decade“. As the adage goes, If it Bleeds, it Leads.

Crime Data and Mistrust of the Police

Those engaged in making crime figures meaningful to the public face unique challenges. When Stephen Lawrence was murdered in 1993, and the following public inquiry found institutional racism to be at the heart of the Met police, public trust towards the police was shattered. Since then, the police have claimed to have rid their ranks of racism entirely.
Police by Luis Jou García, Flickr, CC BY-NC 2.0

Police by Luis Jou García, Flickr, CC BY-NC 2.0

But many remain less than convinced. According to official statistics, in 1999-2000, a black person was five times more likely than a white person to be stopped by police. A decade later, they were seven times more likely. One criminologist commented: “Claims that the Lawrence inquiry’s finding of institutional racism no longer apply have a hollow ring when we look at the evidence on police stops.” [Michael Shiner reported in the Guardian]. Equally, the police distrust the public too. The murder of two young, female police officers in Manchester in 2012 ignited the long-rumbling debate over whether the police should be armed. So the divide between the police and the public is a serious one.

A Different Tack?

In 2011, a review was undertaken by the UK Statistics Authority into Crime Data. Its recommendations included:
  • Improving the presentation of crime statistics to make them more authoritative
  • Reviewing the availability of local crime and criminal justice data on government websites to identify opportunities for consolidation
  • Sharing of best practice and improvements in metadata and providing reassurance on the quality of police crime records.
It’s clear that the UK police recognise the importance of improving their publication of data. But it seems that opening data alone won’t fix the shattered trust between the public and the police, even if the proof that Britons are safer than ever before is there in transparent, easily navigable data. We need to go further back in the chain of provenance, scrutinise the reporting methods of the police for instance. But this is about forgiveness too, and the British public might just not be ready for that yet.

Piracy at the Old Bailey

- October 1, 2014 in court, Culture & History, Legal, old bailey, piracy, pirates

Ben Merriman presents a selection of piracy cases from the proceedings of London's Old Bailey. Although a few live up to the swashbuckling heists of stereotype, many reveal the surprisingly everyday nature of the maritime crimes brought before the court, including cases involving an argument over chickens and the stealing of a captain's hats.

The Open Knowledge Foundation urges the UK Government to stop secret corporate lobbying

- December 13, 2013 in Business, Campaigning, Featured, Legal, Open Government Data, Policy

The Open Knowledge Foundation has joined the members of the UK OGP civil society network in signing an open letter which calls on the Government to put an end to secret corporate lobbying. In its current form the government’s proposed lobbying bill (which is currently going through parliament) will let the vast majority of corporate lobbyists off the hook from being obliged to say who they’re meeting, what decisions they are seeking to influence and how much they are spending. Here are our five reasons why we think this needs to change. If you agree with us, then please sign and share the petition!. The letter urges Ministers to redraft the Transparency of Lobbying, Non-Party Campaigning and Trades Union Administration Bill in order to enable proper public scrutiny of lobbying activity in the UK. Please share this letter (copied below) widely and sign the petition to call on the Government to put a stop to secret lobbying. The Rt Hon Francis Maude MP
The Rt Hon Andrew Lansley MP
Cabinet Office
70 Whitehall
SW1A 2AS Cc: Deputy Prime Minister
12 December 2013 Dear Mr Maude and Mr Lansley, Response to Mr Maude’s letter of 1 November 2013 to the UK OGP civil society network re the Government’s commitment to lobbying transparency As campaigners for greater openness in decision making, we applauded the Coalition commitment in May 2010 to ‘regulate lobbying through introducing a statutory register of lobbyists and ensuring greater transparency’. However, we are extremely concerned that the current plans, in Part 1 of the Transparency of Lobbying, Non-Party Campaigning and Trades Union Administration Bill, will fail to deliver the transparency promised. The proposed register is not fit-for-purpose. In the short time the Government has allowed for debate on the bill, it has been heavily criticised by the Political and Constitutional Reform Select Committee and Members of Parliament, as well as representatives of the consultancy industry and a wide range of civil society groups. We urge you to redraft Part 1 of the Bill to:
  • broaden the definition of lobbyist to include all third party consultants and in-house lobbyists, whether corporate, union or charity;
  • extend the definition to include lobbying of mid-ranking civil servants and special advisors; and
  • introduce fuller disclosure requirements to include the target, topic and estimated cost of lobbying activity.
Central to our concerns is the narrow definition of lobbyist. As drafted, the Bill excludes at least eighty per cent of the industry, notably in-house lobbyists. It will also exclude most key consultant lobbyists through a significant loophole: those who in the course of their lobbying do not make contact with Ministers and Permanent Secretaries will not be required to register. This, as lobbyists and the lobbied well know, is the majority of lobbying activity. The justification for such a narrow definition does not stand up to scrutiny. The Government has defined the problem as a lack of transparency about who an agency is representing when it meets with a Minister. Official meeting lists reveal that this would apply to only a handful of meetings. As many in Parliament have pointed out, if this is a genuine problem, it would be better solved with improved disclosure from Ministers. Of equal concern to us is the lack of any meaningful information on lobbying activity to be included in the proposed register. It would require lobbyists merely to register their clients, but reveal nothing of their interaction with government (i.e. whom they are lobbying, and what they are seeking to influence). This information is essential if the government is to realise its laudable aim through the register of ‘increasing public accountability and public trust in the UK system of government and improving the efficiency of government policy outcomes’. Fuller disclosure would also bring the UK in line with international standards. The fundamental purpose of introducing a register of lobbyists is to allow the public to examine and understand the activities of lobbyists, to improve government accountability and ultimately to rebuild public trust. It is imperative to have in mind the widely held public perception of how decisions are taken by government, a view summed up by David Cameron as ‘a cosy club at the top making decisions in its own interest’. This lack of trust must be of serious concern to Government. Proper disclosure rules for lobbyists would go a long way to dispel this perception. The reality of lobbying in the UK, which would be revealed in a robust register of lobbyists, would be far more mundane than is popularly imagined. A refusal to introduce genuine transparency, however, would only reinforce the perception that public scrutiny is something politicians would rather avoid. The shortcomings of the current Bill are all the more surprising considering the leadership you have shown through the Open Government Partnership and your vocal support for greater transparency. The current proposals threaten to undermine not only your ambition to be ‘the most open and transparent government in the world’, but also detract from the OGP initiative. Civil society groups long ago identified a robust register as a key priority for the Partnership, yet we encountered a surprising reluctance from some Cabinet Office officials to engage with us during the development of the proposals. The result is a register that is wholly inadequate. The Coalition rightly identified ‘secret’ lobbying as an issue of public concern, one which ‘goes to the heart of why people are so fed up with politics’. ‘We can’t go on like this,’ said David Cameron. We urge you to now fulfil your commitment with a proper register which will allow public scrutiny of lobbying activity in the UK. Yours sincerely,
Alexandra Runswick, Director, Unlock Democracy
Dr Andy Williamson FRSA, Founder, FutureDigital & Chair,
Anne Thurston, Director, International Records Management Trust
Anthony Zacharzewski, democracy campaigner
Gavin Hayman, Director of Campaigns, Global Witness
Graham Gordon, Head of Public Policy, CAFOD
Jonathan Gray, Director of Policy, The Open Knowledge Foundation
Maurice Frankel, Director, Campaign for Freedom of Information
Miles Litvinoff, Coordinator, Publish What You Pay UK
Simon Burall, Director, Involve
Tamasin Cave, Director, Spinwatch
Thomas Hughes, Executive Director, ARTICLE 19

Creative Commons Version 4.0 Released

- November 28, 2013 in Legal

This is a guest blog post by Timothy Vollmer, Manager of Policy and Data at Creative Commons. Creative Commons has finally released Version 4.0 of the license suite. It’s been two years since we began the license update process, but now it’s done. The 4.0 licenses are the most global, legally robust licenses produced by CC to date. You can find highlights of the changes on the website. Probably the most significant improvement is the expansion in license scope to include sui generis database rights–those copyright-like rights that exist in Europe and a few other countries which are granted to those who exert some effort into compiling a database. The 4.0 licenses (of course, those aligned with the Open Definition) are now better suited for use by governments and publishers of public sector information and open data. A few other changes: A more global license The new licenses have improved terminology that’s better understood worldwide. And there will be official translations of the CC licenses, so that users of around the world can read and understand the complete licenses in their local languages. Common-sense attribution The licenses explicitly permit licensees to satisfy the attribution requirement with a link to a separate page for attribution information. This was already common practice on the internet and possible under earlier versions of the licenses, and Version 4.0 alleviates any uncertainty about its use. 30-day window to correct license violations All CC licenses terminate when a licensee breaks their terms, but under 4.0, a licensee’s rights are reinstated automatically if she corrects a breach within 30 days of discovering it. This is common sense and how several other open licenses handle inadvertent violations. Clarity about adaptations The BY and BY-NC 4.0 licenses are more clear about how adaptations are to be licensed. These licenses now clarify that you can apply any license to your contributions you want so long as your license doesn’t prevent users of the remix from complying with the original license. There is a more in-depth discussion of the various policy decisions and versioning considerations here. We extend our gracious thanks to everyone in the open licensing community who’ve helped bring 4.0 to life. Image source: By Petey21 (Own work) [Public domain], via Wikimedia Commons

New Sources and Rights section on The Public Domain Review

- August 28, 2013 in copyright, Featured, Legal, licensing, Public Domain, public domain review, sources

Today sees the announcement of two exciting new developments on The Public Domain Review, changes which centre on better celebrating those institutions which have decided to open up their collections and helping users understand the different rights for reuse that apply to the content.

New sources section

The new sources page – – lists the major sources for material found on The Public Domain Review: both online content aggregators (websites which bring together into one place digital copies from disparate sources) and the content providers themselves (the institutions who will often hold the physical object from which the digital copy has been made). This list is intended to be at once a celebration of the sources we use in the creation of The Public Domain Review and also a mapping of the current landscape of openly licensed collections, a map which we hope will encourage users to explore these wonderful sources for themselves. We also hope that by highlighting the wealth of institutions that have already opened up their public domain collections, those institutions that have not yet opened up might be encouraged to do so. Each institution has its own dedicated page which lists their content featured on our site.

New attribution feature and accompanying rights and re-use section

Each collection post on The Public Domain Review now has an accompanying table clearly stating: 1) the source form which the material derives 2) if relevant, a hat-tip to any person or website through which we found the material 3) download links, and 4) information regarding rights and re-use of both the underlying work and the digital copy which we are presenting. To accompany the “rights and re-use” part of this new feature we have a dedicated page “Rights labelling on our site” which functions to explain some of the terms encountered and, in general, give a helpful overview of the landscape regarding the complex world of rights and re-use relating to public domain works and their digital copies.

We hope that these changes will help give the recognition deserved to the institutions that have taken the bold step of openly licensing their collections, and also that those who appreciate the fruits of this labor will, with more transparency regarding rights, feel more empowered to share and re-use it.

Open and the “Next Great Copyright Act”

- March 20, 2013 in Legal, Open Content, Public Domain

Director of the U.S. Copyright Office Maria Pallante is expected to call today for updates to U.S. copyright law. Her brief written testimony is already available and a longer speech given two weeks ago (titled “The Next Great Copyright Act”) provides additional flavor. Substantial changes to copyright will take years to play out in the U.S., and similarly around the world. If Open is to impact how copyright and other knowledge regulation plays out over the next years, we must assert how and why, and develop our strategies for making it so. Statements like Pallante’s provide not-to-be-missed opportunities to contextualize and explain the importance of Open to the world. While Pallante’s calls are at best a mixed bag, two items offer glimmers of hope and are useful for illustrating both the value and strategy of Open:
Congress also may need to apply fresh eyes to the next great copyright act to ensure that the copyright law remains relevant and functional. This may require some bold adjustments to the general framework. You may want to consider alleviating some of the pressure and gridlock brought about by the long copyright term — for example, by reverting works to the public domain after a period of life plus fifty years unless heirs or successors register their interests with the Copyright Office.
50 years with an option for more is far from anything that might be considered optimal — OKF’s Rufus Pollock has estimated 15 years and others less, even before accounting for values achieved through openness such as freedom and equality — and is a dangerous place to start new debate, considering that Disney lobbyists have not yet weighed in. But any possibility of mitigating the heretofore relentless march of copyright term extension and by implication appreciation of the value of the public domain is welcome, and an opportunity. Some of the most compelling work by the Open community involves making public domain works accessible, and celebrating our bounty. Compelling for culture — and critical for policy. What better way to make the case for expanding and protecting the public domain than to demonstrate and increase the value of works that are free of copyright restriction even now? Well, we have to talk about our work in those terms, loudly! Public Domain Review postcards Pallante:
And in compelling circumstances, you may wish to reverse the general principle of copyright law that copyright owners should grant prior approval for the reproduction and dissemination of their works — for example, by requiring copyright owners to object or “opt out” in order to prevent certain uses, whether paid or unpaid, by educational institutions or libraries.
Openly licensed works — those that all are free to use, reuse, and redistribute subject only, at most, to the requirement to attribute and/or share-alike — unambiguously permit such uses, right now, and are increasingly becoming expected and even mandated where public funding is provided or public benefit is a primary goal. What better way to make the case for liberal policy where public funding or benefit is at stake than to promote and demonstrate the value of Open works now? Again, we have to talk about our usual pro-openness work’s relevance to policy, loudly! But open licensing is opt-in (even when mandated, it is as if a group opted-in, still leaving default policy for everyone else), ultimately limiting its impact. We shouldn’t shy away from that reality — indeed it is a key reason open licensing can be, if we make it so, a harbinger of better default policy, but not at all a substitute for better default policy. When positioning Open in the context of broader copyright and other information regulation debates, we shouldn’t be content to merely address points made in those debates, but from an Open perspective. We must also raise additional issues that arise from the experience of Open movements: a knowledge commons requires protection and promotion. Private enclosure of public domain and Open works, eg through “copyfraud”, might be addressed through policy. Ensuring the public’s right to audit, understand, replicate, and modify data and tools such as software and designs for research and hardware, might be addressed through policy. Actually we know these can be addressed through policy, as demonstrated for decades on an opt-in basis through copyleft, one of the signal innovations of our movements. Although over 25 years old (starting with free software), open licenses and the amazing projects that use them (that run the Internet, and are making governments more transparent, bit by bit, and so much more) have played almost no explicit role in debates about default copyright policy. Hopefully you’re beginning to think that we can change that — with little or no alteration of our existing Open activities, as we mainly need to appreciate just how provocative and potent those are, and tell the public, especially the policy world. Ultimately, we can shift the centrality of “copyright policy” to that of “open policy” — what information regulation is best for the knowledge commons — for all humanity’s yearning for freedom, equality, and well governed institutions.

Content Mining in Europe: Further Licensing is Not The Only Way

- February 28, 2013 in Access to Information, Legal, Open Science

A significant number of groups who support knowledge policies for the public good, including ourselves, have signed and published a letter of concern arising from one of the working groups of the Licences for Europe – A Stakeholder Dialogue meetings in Brussels. This particular working group was Working Group 4, which was set to discuss ways and means of enabling Text and Data Mining (TDM) for research. I was present as both a user of mining techniques in my academic research and official representative of the Open Knowledge Foundation, as participant in the discussions. The letter expresses concerns that in this TDM meeting we were presented “not with a stakeholder dialogue, but a process with an already predetermined outcome –namely that additional licensing is the only solution to the problems being faced by those wishing to undertake TDM” We believe that this dialogue should fairly include discussion of copyright limitations and exceptions for such TDM activity. The Vice-President of the European Commission responsible for the Digital Agenda Neelie Kroes (pictured above) made a speech shortly before the working group meeting which indicated this would be an option to consider on the table of discussion:
But keep your minds open: maybe in some cases licensing won’t be the solution
It was also in the notes published in advance of the working group meeting that discussion would explore:
the potential and possible limits of standard licensing models
(emphasis mine) Yet when we started discussions, all our attempts to discuss copyright exemptions for TDM, as successfully practised in the US, Japan, Israel, Taiwan and South Korea, were quickly shut-down by the dialogue moderators. It was made crystal clear to us that any further attempts to discuss this as a solution to the problems of TDM access would not be entertained. Many of us left the meeting feeling extremely frustrated that we were prevented from discussing what we thought was a reasonable and optimal solution practised elsewhere, and were only allowed to discuss sub-optimal cumbersome options involving re-licencing of content or collective licencing. Thus the letter of concern finishes with 3 simple requests:
  1. All evidence, opinions and solutions to facilitate the widest adoption of TDM are given equal weighting, and no solution is ruled to be out of scope from the outset;
  2. All the proceedings and discussions are documented and are made publicly available;
  3. DG Research and Innovation becomes an equal partner in Working Group 4, alongside DGs Connect, Education and Culture, and MARKT – reflecting the importance of the needs of research and the strong overlap with Horizon 2020.
The greater than 50 participants & signatories of the letter include a Nobel Prize winner (Sir John Sulston), and top representatives of most European research funders, libraries and even smart tech companies with an interest in this area like Mendeley. We sincerely hope the European Commission takes action on this matter.  

Did Gale Cengage just liberate all of their public domain content? Sadly not…

- January 9, 2013 in Featured, Free Culture, Legal, Open Access, Open/Closed, Public Domain, WG Public Domain

Earlier today we received a strange and intriguing press release from a certain ‘Marmaduke Robida’ claiming to be ‘Director for Public Domain Content’ at Gale Cengage’s UK premises in Andover. Said the press release:
Gale, part of Cengage Learning, is thrilled to announce that all its public domain content will be freely accessible on the open web. “On this Public Domain Day, we are proud to have taken such a ground-breaking decision. As a common good, the Public Domain content we have digitized has to be accessible to everyone” said Marmaduke Robida, Director for Public Domain Content, Gale. Hundreds of thousands of digitized books coming from some of the world’s most prestigious libraries and belonging to top-rated products highly appreciated by the academic community such as “Nineteenth Century Collection Online”, “Eighteenth Century Collection Online”, “Sabin America”, “Making of the Modern World” and two major digitized historical newspaper collections (The Times and the Illustrated London news) are now accessible from a dedicated websit. The other Gale digital collections will be progressively added to this web site throughout 2013 so all Public Domain content will be freely accessible by 2014. All the images are or will be available under the Public Domain Mark 1.0 license and can be reused for any purpose. Gale’s global strategy is inspired by the recommandations issued by the European reflection group “Comite des sages” and the Public Domain manifesto. For Public Domain content, Gale decided to move to a freemium business model : all the content is freely accessible through basic tools (Public Domain Downloader, URL lists, …), but additional services are charged for. “We are confident that there still is a market for our products. Our state-of-art research platforms offer high quality services and added value which universities or research libraries are ready to pay for” said Robida. A specific campaign targeted to national and academic libraries for promoting the usage of Public Domain Mark for digitized content will be launched in 2013. “We are ready to help the libraries that have a digitization programme fulfill their initial mission : make the knowledge accessible to everyone. We also hope that our competitors will follow the same way in the near future. Public Domain should not be enclosed by paywalls or dubious licensing terms” said Robida.
The press release linked to a website which proudly proclaimed:
All Public Domain content to be freely available online. Gale Digital Collections has changed the nature of research forever by providing a wealth of rare, formerly inaccessible historical content from the world’s most prestigious libraries. In january 2013, Gale has taken a ground-breaking decision and chosen to offer this content to all the academic community, and beyond to mankind, to which it belongs
This was met with astonishment by members of our public domain discussion list, many of whom suspected that the news might well be too good to be true. The somewhat mysterious, yet ever-helpful Marmaduke attempted to allay these concerns on the list, commenting:
I acknowledge this decision might seem a bit disorientating. As you may know, Gale is already familiar to give access freely to some of its content [...], but for Public Domain content we have decided to move to the next degree by putting the content under the Public Domain Mark.
Several brave people had a go at testing out the so-called ‘Public Domain Downloader’ and said that it did indeed appear to provide access to digitised images of public domain texts – in spite of concerns in the Twittersphere that the software might well be malware (in case of any ambiguity, we certainly do not suggest that you try this at home!). I quickly fired off an email to Cengage’s Director of Media and Public Relations to see if they had any comments. A few hours later a reply came back:
This is NOT an authorized Cengage Learning press release or website – our website appears to have been illegally cloned in violation of U.S. copyright and trademark laws. Our Legal department is in the process of trying to have the site taken down as a result. We saw that you made this information available via your listserv and realize that you may not have been aware of the validity of the site at the time, but ask that you now remove the post and/or alert the listserv subscribers to the fact that this is an illegal site and that any downloads would be in violation of copyright laws.
Sadly the reformed Gale Cengage – the Gale Cengage opposed to paywalls, restrictive licensing and clickwrap agreements on public domain material from public collections, the Gale Cengage supportive of the Public Domain Manifesto and dedicated to liberating of public domain content for everyone to enjoy – was just a hoax, a fantasm. At least this imaginary, illicit doppelgänger Gale gives a fleeting glimpse of a parallel world in which one of the biggest gatekeepers turned into one of the biggest liberators overnight. One can only hope that Gale Cengage and their staff might – in the midst of their legal wrangling – be inspired by this uncanny vision of the good public domain stewards that they could one day become. If only for a moment.

World’s first REAL commercial open data curation project!

- October 3, 2012 in Featured, Legal, Open Data, Policy

The following post is by Francis Irving, CEO of ScraperWiki.

Our laws are still published on calf skin (vellum)

Can you think of an open data curation project where the people who work on it come from multiple commercial companies? In the mid 1990s, as open source code began to boom, the equivalent was commonplace. Geeks working at ISPs would together patch the Apache webserver into shape. Startups like RedHat would pay for staff to work on lots of projects in order to produce a whole operating system. For years I’ve asked, where are the equivalent projects in open data? Nada. Not one. Until today. I finally found one. It’s the UK’s Statute Law database, which is maintained by the National Archives. I explained back in 2006 how it used to be proprietary data, and how it was finally opened up in an incomplete form. Briefly, Parliament doesn’t release a usable set of laws. They release Acts, which are changes to laws (patch files, if you’re a geek). These need to be “consolidated” with existing laws into the actual rules we have to obey. Two commercial companies (LexisNexis and Westlaw, so called after centuries of takeovers) do this consolidation themselves. They charge a handsome price. Nobody can compete with them, as they don’t have the current laws to start from, even if they had the money to keep up with new changes. I spent a chunk of yesterday afternoon talking to John Sheridan (right) from the National Archives. He runs the Government’s Statute Law project. Jeni Tennison (left) is his technical mastermind. Last time I spoke to her a year or two ago she was worried that they would never finish the work. The sheer volume of new laws and difficulty of consolidations seemed insurmountable. Would they ever have a complete image of current law?
Now they’ve cracked it. By forming the world’s first real open data curation project.
I’ll start with a quote from one of the red-in-tooth-and-claw companies who are contributing to this.
I represent the Practical Law Company, one of the private sector organisations involved in the Expert Participation Programme. We’re really excited by these developments and salute John Sheridan and his team for their groundbreaking and elegant work on the API and legislation database. is the official publishing place for UK legislation and so it is really important work.
The programme is now starting to make a real and visible difference to the status of legislation on the website. By employing people to work with National Archives and as a first step, we’ve been able to ensure that the Companies Act 2006 is now fully consolidated on This is a particularly important piece of legislation for many of our customers but we intend to carry on the consolidation work on other legislation.
Well done, National Archives. (Source: comment by Elizabeth Woodman)
Truly collaborative The astonishing process goes roughly like this:
  1. John and Jeni and their team build an amazing web admin interface for skilled users to easily piece together the consolidated law jigsaw from the unconsolidated acts and statutory instruments.

  2. Various organisations, such as the Practical Law Company, the Welsh Government (they want to sort out Welsh language law, nobody commercial can be bothered), the Department for Work and Pensions (they make legal guides for tens of thousands of their staff, and so can’t afford the commercial providers) and a couple of other commercial providers (I’ll let John name names, as some that he mentioned to me aren’t fully announced yet) decided they want to contribute.

  3. They pay for some staff to work on it full time. The staff are trained initially by the National Archive, and work for the contributing organisation. There are currently about 30 in total. For example, Practical Law employ 14 people to do this stuff. There’s a queue, they can’t train new ones fast enough to meet demand.

  4. The staff fix up the open data. It appears on, as well as in XML files and as a SPARQL endpoint.

  5. Profit. No really, this is a better business model than stealing underpants. For example, Practical Law release new products based on top of the now lovely clean, free data (such as the Companies Act they mention above).

The National Archive team were marking up 10,000 effects (i.e. patches of one bit of law over another) per year all by themselves. With 15,000 new effects being passed by Parliament each year, they were rapidly getting deeper into debt. Now they’ve improved the process, and have the growning help of industry and other parts of Government, in just one year the basic metadata is done for it all. They aim to have fully caught up by 2015, including secondary legislation. Come the next Parliament, all laws should appear consolidated on the site – and anywhere else that wants it – in real time. Saves money and improves lives It’s win win win win. Well, unless you’re one of the two companies with a proprietary version of the database. Although they don’t seem too unhappy about it – for example, WestLaw has contributed electronic versions of pre-war Statutory Instruments that the Government had lost. In the future there will be even more cost savings. For example, tens of millions are spent each year by the Court Service buying back proprietary copies of the laws they have to enforce. That could end when the open statute law database is fully finished in 2015. However, as ever with public interest activity on the Internet, the real benefit is hidden and subtle. John explained to me that every month about 2 million people land on after searching for things like “allotments act 1950“ in search engines. Most of them are non-lawyer professionals – HR, company secretaries, police officers. Better open legal data will help them do their job more effectively and in less time. The next large user base is concerned citizens, defending their own rights. For example, a mother fighting with her local authority over statementing of her child. Giving them clear access to the law boosts their credibility with the authorities, and helps to make an otherwise messy dispute rules based and easier to resolve. The lesson for open data projects As well as being just brilliant, this story has torn a blindfold off a once baffled me. Why why why are there no collaborative open data curation projects? Zarino Zappia, who works for my company ScraperWiki, did a whole thesis at the Oxford Internet Institute hunting for such projects. He couldn’t find any. I now think the problem with the other nascent projects was that they didn’t include the upstream source (i.e. the National Archive in this case). Upstream help in two ways:

  1. Act as a strong power to set up the project. It was both hard and expensive. In theory the Practical Law Company could have done this, but in practice the economic gain for just them wouldn’t have been enough.
  2. The original source is being fixed. It’s hard to state how much better that is than tidying up a downstream copy (I know, from making things like TheyWorkForYou and ScraperWiki). It’s technically and procedurally much less complicated. It gives a strong provenance and trust that simply cannot be earnt any other way.
Open source projects have different needs to get going. Open data curation is truly unique. You need both the data provider, and commercial contributors, for a sustainable project. What data next? I would like to see the same model applied to other open data sets. How about…
  1. Fine grained inflation data. Apparently somebody external offered to help the ONS improve the way they publish them, but were turned down. Perhaps now, with a successful example elsewhere in Government, this can happen.
  2. Department for Transport data, such as public transport timetables. There’s some collaboration round this already, but would love to see the Government crowd sourcing accurate fixes so that the data becomes perfect (with Google, Apple, and FixMyTransport all contributing!).
  3. Parliamentary debates. I know several organisations (some commercial, some charitable) who curate that data, which is increasingly a commodity. Parliament itself wants to publish it better. A project run between them all would be very powerful.
I’m sure you can think of many more. And here’s the kicker. Jeni has has just been appointed Technical Director of The Open Data Institute. Where she is going to work out how to kickstart a flurry of such successful open data projects. Today our law. Tomorrow the world. You can read more about this project here: