Semantic Web views from EKAW 2014

- December 4, 2014 in Bibliographic, Events, linked data, tools

Last week (24th to 28th of November 2014) I attended the the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW2014). Set in Linköping, a charming (and chilly) city in Sweden, the conference had a strong focus on Semantic Web related areas, such as Description Logics, OWL and Linked Data. For tweets about the more »

What We Hope the Digital Public Library of America Will Become

- April 17, 2013 in Bibliographic, Featured, Free Culture, Open Content, Open GLAM, Open Humanities, Policy, Public Domain

Tomorrow is the official launch date for the Digital Public Library of America (DPLA). If you’ve been following it, you’ll know that it has the long term aim of realising “a large-scale digital public library that will make the cultural and scientific record available to all”. More specifically, Robert Darnton, Director of the Harvard University Library and one of the DPLA’s leading advocates to date, recently wrote in the New York Review of Books, that the DPLA aims to:
make the holdings of America’s research libraries, archives, and museums available to all Americans—and eventually to everyone in the world—online and free of charge
What will this practically mean? How will the DPLA translate this broad mission into action? And to what extent will they be aligned with other initiatives to encourage cultural heritage institutions to open up their holdings, like our own OpenGLAM or Wikimedia’s GLAM-WIKI? Here are a few of our thoughts on what we hope the DPLA will become.

A force for open metadata

The DPLA is initially focusing its efforts on making existing digital collections from across the US searchable and browsable from a single website. Much like Europe’s digital library, Europeana, this will involve collecting information about works from a variety of institutions and linking to digital copies of these works that are spread across the web. A super-catalogue, if you will, that includes information about and links to copies of all the things in all the other catalogues. Happily, we’ve already heard that the DPLA is releasing all of this data about cultural works that they will be collecting using the CC0 legal tool – meaning that anyone can use, share or build on this information without restriction. We hope they continue to proactively encourage institutions to explicitly open up metadata about their works, and to release this as machine-readable raw data. Back in 2007, we – along with the late Aaron Swartz – urged the Library of Congress to play a leading role in opening up information about cultural works. So we’re pleased that it looks like DPLA could take on the mantle. But what about the digital copies themselves?

A force for an open digital public domain

The DPLA has spoken about using fair use provisions to increase access to copyrighted materials, and has even intimated that they might want to try to change or challenge the state of the law to grant further exceptions or limitations to copyright for educational or noncommercial purposes (trying to succeed where Google Books failed). All of this is highly laudable. But what about works which have fallen out of copyright and entered the public domain? Just as they are doing with metadata about works, we hope that the DPLA takes a principled approach to digital copies of works which have entered the public domain, encouraging institutions to publish these without legal or technical restrictions. We hope they become proactive evangelists for a digital public domain which is open as in the Open Definition, meaning that digital copies of books, paintings, recordings, films and other artefacts are free for anyone to use and share – without restrictive clickwrap agreements, digital rights management technologies or digital watermarks to impose ownership and inhibit further use or sharing. The Europeana Public Domain Charter, in part based on and inspired by the Public Domain Manifesto, might serve as a model here. In particular, the DPLA might take inspiration from the following sections:
What is in the Public Domain needs to remain in the Public Domain. Exclusive control over Public Domain works cannot be re-established by claiming exclusive rights in technical reproductions of the works, or by using technical and or contractual measures to limit access to technical reproductions of such works. Works that are in the Public Domain in analogue form continue to be in the Public Domain once they have been digitised. The lawful user of a digital copy of a Public Domain work should be free to (re-) use, copy and modify the work. Public Domain status of a work guarantees the right to re-use, modify and make reproductions and this must not be limited through technical and or contractual measures. When a work has entered the Public Domain there is no longer a legal basis to impose restrictions on the use of that work.
The DPLA could create their own principles or recommendations for the digital publication of public domain works (perhaps recommending legal tools like the Creative Commons Public Domain Mark) as well as ensuring that new content that they digitise is explicitly marked as open. Speaking at our OpenGLAM US launch last month, Emily Gore, the DPLA’s Director for Content, said that this is definitely something that they’d be thinking about over the coming months. We hope they adopt a strong and principled position in favour of openness, and help to raise awareness amongst institutions and the general public about the importance of a digital public domain which is open for everyone.

A force for collaboration around the cultural commons

Open knowledge isn’t just about stuff being able to freely move around on networks of computers and devices. It is also about people. We think there is a significant opportunity to involve students, scholars, artists, developers, designers and the general public in the curation and re-presentation of our cultural and historical past. Rather than just having vast pools of information about works from US collections – wouldn’t it be great if there were hand picked anthologies of works by Emerson or Dickinson curated by leading scholars? Or collections of songs or paintings relating to a specific region, chosen by knowledgable local historians who know about allusions and references that others might miss? An ‘open by default’ approach would enable use and engagement with digital content that breathes a life into it that it might not otherwise have – from new useful and interesting websites, mobile applications or digital humanities projects, to creative remixing or screenings of out of copyright films with new live soundtracks (like Air’s magical reworking of Georges Méliès’s 1902 film Le Voyage Dans La Lune). We hope that the DPLA takes a proactive approach to encouraging the use of the digital material that it federates, to ensure that it is as impactful and valuable to as many people as possible.

Communia condemns the privatisation of the Public Domain by the BnF

- January 21, 2013 in Bibliographic, COMMUNIA, OKF France, Public Domain

Last week the Bibliothèque nationale de France (BnF) concluded two new agreements with private companies to digitze over 70.000 old books, 200.000 sound recordings and other documents belonging (either partially or as a whole) to the public domain. While these public private partnerships enable the digitization of these works they also contain 10-year exclusive agreements allowing the private companies carrying out the digitization to commercialize the digitized documents. During this period only a limited number of these works may be offered online by the BnF. Together with La Quadrature du Net, Framasoft, SavoirsCom1 and the Open Knowledge Foundation France, COMMUNIA has issued a statement (in french) to express our profound disagreement with the terms of these partnerships that restrict digital access to an important part of Europe’s cultural heritage. The agreements that the BnF has entered into, effectively take the works being digitized out of the public domain for the next 10 years. The value of the public domain lies in the free dissemination of knowledge and the ability for everyone to access and create new works based on previous works. Yet, instead of taking advantage of the opportunities offered by digitization, the exclusivity of these agreements will force public bodies, such as research institutions or university libraries, to purchase digital content that belongs to the common cultural heritage. As such, these partnerships constitute a commodification of the public domain by contractual means. COMMUNIA, of which the OKFN is a partner, has been critical of such arrangements from the start (see their Public Domain Manifesto) and Policy Reccomendations 4 & 5. More interestingly these agreements are also in direct contradiction with the Public Domain Charter published by the Europeana Foundation in 2011. In this context it is interesting to note that the director of Bibliothèque nationale de France currently serves as the chairman of the Europeana Foundation’s Executive Board.

Goodbye Aaron Swartz – and Long Live Your Legacy

- January 14, 2013 in Access to Information, Bibliographic, Campaigning, Featured, News, Open Access, Open Data, Open Government Data, Policy

Aaron Swartz, coder, writer, archivist and activist, took his own life in New York on Friday. Aaron worked tirelessly to open up and maximise the societal impact of information in three areas which are central to our work at the Foundation: public domain cultural works, public sector information, and open access to publicly funded research. He was one of the original architects behind the Internet Archive’s Open Library project, which aims to create ‘one web page for every book’. While he was there we compared notes about trying to automatically estimate which works are in the public domain in different countries around the world. This was part of a broader vision to enable public access to the public domain, and to ensure that digitisation initiatives result in open digital copies of public domain works that everyone is free to use and enjoy, not just copies owned and protected by large corporations who might sell or restrict access to the world’s heritage. Around this time Aaron and I met in San Francisco to co-draft a petition to the Library of Congress to encourage them to take a leading role in opening up data from the world’s libraries and memory institutions. This was several years before a wave of institutions started explicitly opening up data about their holdings. We remained in contact regarding his work on open government data in the US. Aaron was involved in drafting the highly influential 8 principles for open government data. We wanted to try to better coordinate developments on either side of the Atlantic. Later he was in the papers for downloading around a fifth of the US government’s huge Public Access to Court Records (PACER) system, around 780 gigabytes, and releasing it for free to the public (access was usually charged by the page) – which earned him an FBI file. In his 2008 Guerilla Open Access Manifesto Aaron argued that “the world’s entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations” and, “in the grand tradition of civil disobedience”, urged internet users to “fight back”:
We need to take information, wherever it is stored, make our copies and share them with the world. We need to take stuff that’s out of copyright and add it to the archive. We need to buy secret databases and put them on the Web. We need to download scientific journals and upload them to file sharing networks. We need to fight for Guerilla Open Access.
In 2010 he founded Demand Progress, which helped to mobilise over a million people in response to proposed legislation like the Combating Online Infringement and Counterfeits Act (COICA). In 2011 he again hit the headlines when he was arrested for downloading roughly 4 million subscription-only academic articles from JSTOR by placing a laptop in a computer cupboard at MIT and using this to gain unauthorised access to the JSTOR service. The prosecution alleged that he intended to make these articles freely available on the web. Last September the US Federal Government raised the felony count from four to thirteen, which meant that Aaron was potentially facing a total of 50+ years and a fine in the area of $4 million for his actions. His family suggested that the case was a factor in his death – and blamed the Massachusetts U.S. Attorney’s office for “intimidation and prosecutorial overreach” and MIT for “refus[ing] to stand up for Aaron and its own community’s most cherished principles”. The president of MIT has just announced that he has ordered an investigation into their role in Aaron’s prosecution. As Peter Eckersley from the Electronic Frontier Foundation commented on Saturday:
While his methods were provocative, the goal that Aaron died fighting for — freeing the publicly-funded scientific literature from a publishing system that makes it inaccessible to most of those who paid for it — is one that we should all support.
While Aaron was deeply involved in all kinds of technical, scholarly and organising activities to promote an open digital commons and an open internet – from helping to develop RSS 1.0 and Markdown, to early sketches of the semantic web with some of its pioneers and work on the first technical implementations of the Creative Commons licenses – he also never lost sight of the bigger picture, of what it was all for. He was a talented coder and knew how to take a principled stance, but he was never one to get lost in detail or dogma. From his writings about how data-driven transparency initiatives are not enough to effect change in themselves, to his guide to developing software that addresses real needs, he was always aware of the fact that using the information, technology and the internet to change the world is not easy, and requires graft, skill, scrutiny, critical reflection and taking risks. Aaron’s passing is a tremendously sad and significant loss. Long live his legacy.
To find out more about Aaron’s life and works, you can look at his writings and the memorial site set up by his family. You can also read tributes from Tim Berners-Lee, Cory Doctorow, Brewster Kahle, Lawrence Lessig, and Erik Moeller, and read obituaries and news articles on the BBC, the Economist, Forbes, Gigaom, the Guardian, the Huffington Post, the New York Times, Techdirt, the Telegraph and Wired. In tribute, hundreds of academics have started tweeting links to their research papers using the hashtag #pdftribute. The Internet Archive has started an Aaron Swartz Collection.

The Digital Public Library of America moving forward

- November 6, 2012 in Bibliographic, External, Open Content, Open Data, Open GLAM

A fuller version of this post is available on the Open GLAM blog The Digital Public Library of America (DPLA) is an ambitious project to build a national digital library platform for the United States that will make the cultural and scientific record available, free to all Americans. Hosted by the Berkman Center for Internet & Society at Harvard University, the DPLA is an international community of over 1,200 volunteers and participants from public and research libraries, academia, all levels of government, publishing, cultural organizations, the creative community, and private industry devoted to building a free, open, and growing national resource. Here’s an outline of some of the key developments in the DPLA planning initiative. For more information on the Digital Public Library of America, including ways in which you can participate, please visit


In the fall of 2012, the DPLA received funding from the National Endowment for the Humanities, the Institute for Museum and Library Services, and the Knight Foundation to support our Digital Hubs Pilot Project. This funding enabled us to develop the DPLA’s content infrastructure, including implementation of state and regional digital service pilot projects. Under the Hubs Pilot, the DPLA plans to connect existing state infrastructure to create a national system of state (or in some cases, regional) service hubs. The service hubs identified for the pilot are:
  • Mountain West Digital Library (Utah, Nevada and Arizona)
  • Digital Commonwealth (Massachusetts)
  • Digital Library of Georgia
  • Kentucky Digital Library
  • Minnesota Digital Library
  • South Carolina Digital Library
In addition to these service hubs, organizations large digital collections that are going make their collections available via the DPLA will become content hubs. We have identified the National Archives and Records Administration, the Smithsonian Institute, and Harvard University as some of the first potential content hubs in the Digital Hubs Pilot Project. Here’s our director for content, Emily Gore, to give you a full overview:

Technical Development

The technical development of the Digital Public Library of America is being conducted in a series of stages. The first stage (December 2011-April 2012) involved the initial development of a back-end metadata platform. The platform provides information and services openly and to all without restriction by way of open source code. We’re now on stage two: integrating continued development of the back-end platform, complete with open APIs, with new work on a prototype front end. It’s important to note that this front-end will serve as a gesture toward the possibilities of a fully built-out DPLA, providing but one interface for users to interact with the millions of records contained in the DPLA platform. Development of the back-end platform — conducted publicly, with all code published on GitHub under a GNU Affero General Public License — continues so that others can develop additional user interfaces and means of using the data and metadata in the DPLA over time, which continues to be a key design principle for the project overall.


We’ve been hosting a whole load of events, from our large public events like the DPLA Midwest last month in Chicago, to smaller more intimate hackathons. These events have brought together a wide range of stakeholders — librarians, technologists, creators, students, government leaders, and others – and have proved exciting and fruitful moments in driving the project forward. On November 8-9, 2012, the DPLA will convene its first “Appfest” Hackathon at the Chattanooga Public Library in Chattanooga, TN. The Appfest is an informal, open call for both ideas and functional examples of creative and engaging ways to use the content and metadata in the DPLA back-end platform. We’re looking for web and mobile apps, data visualization hacks, dashboard widgets that might spice up an end-user’s homepage, or a medley of all of these. There are no strict boundaries on the types of submissions accepted, except that they be open source. You can check out some of the apps that might be built at the upcoming hackathon on the Appfest wiki page. The DPLA remains an extremely ambitious project, and we encourage anyone with an interest in open knowledge and the democratization of information to participate in one form or another. If you have any questions about the project or ways to get involved, please feel free to email me at kwhitebloom[at]

#OpenDataEDB 3

- September 14, 2012 in Bibliographic, Events, Join us, linked-open-data, Meetups, OKF, OKScotland, Open Data, Open GLAM, Open Government Data

Amidst the kerfuffle and cacophony of the Fringe Festival packing up for another year, the Edinburgh contingent came together again to meet, greet, present and argue all aspects of Open Data and Knowledge. OKFN Meet-ups are friendly and informal evenings for people to get together to share and debate all areas of openness. Depending on the number of people on a given evening, we have presentations and/or round-table discussions about Open Knowledge and Open Data – from politics and philosophy to the practicalities of theory and practice. We have had two previous events (see here for the ‘launch’ write-up and here for the invitation to the second instalment); this time we were kindly hosted by the Informatics Forum, and the weather stayed fine enough to explore the roof terrace (complete with vegetable garden, gizmos to record wind-speed and weather, a view across the city to Arthur’s Seat and even a blue moon). Around 20 of us gathered together and presentations were given by the following people:
  • James Baster – Open Tech Calendar: an introduction to this early-stage project to bring tech meet-ups together, talk about the different ways we are trying to be open and ask for feedback and help;
  • Ewan Klein – a short overview of business models for Open Data, including for government bodies;
  • Gordon Dunsire – library standards and linked data;
  • Gill Hamilton – National Library of Scotland’s perspective of library standards and open data;
  • Bob Kerr – State of the Map Scotland (see here for Bob’s featured OKFN blog post);
  • Naomi Lillie – OKFN as part of the Scottish Open effort.
What struck me overall was that everybody already knows each-other… As well as cross-over in the talks, I kept trying to introduce people who would exclaim, “Ah yes! How was the holiday / conference / wedding?” or similar. This was quite useful, though, as it emphasised the point I made in my talk: OKFN doesn’t need to start anything in Scotland, as efforts towards Open are already ongoing and to great effect, we just want to provide support and possibly a brand under which these activities can be coordinated and promoted. With this in mind, we are going to look into a Scotland OKFN group as soon as things settle down again after OKFest – keep your eyes open for updates to follow! To keep up-to-date with #OpenDataEDB and similar events, with the above and other interesting folks, and with the emerging Scotland OKFN group: