You are browsing the archive for WG Open Bibliographic Data.

JISC Open Biblio 2 project – final report

- August 23, 2012 in Bibliographic, jiscopenbiblio2, OKF Projects, Open GLAM, openbiblio, WG Open Bibliographic Data, Working Groups

This is cross-posted from Following on from the success of the first JISC Open Bibliography project we have now completed a further year of development and advocacy as part of the JISC Discovery programme. Our stated aims at the beginning of the second year of development were to show our community (namely all those interested in furthering the cause of Open via bibliographic data, including: coders; academics; those with interest in supporting Galleries, Libraries, Archives and Museums; etc) what we are missing if we do not commit to Open Bibliography, and to show that Open Bibliography is a fundamental requirement of a community committed to discovery and dissemination of ideas. We intended to do this by demonstrating the value of carefully managed metadata collections of particular interest to individuals and small groups, thus realising the potential of the open access to large collections of metadata we now enjoy. We have been successful overall in achieving our aims, and we present here a summary of our output to date (it may be useful to refer to this guide to terms).


BibServer and FacetView

The BibServer open source software package enables individuals and small groups to present their bibliographic collections easily online. BibServer utilises elasticsearch in the background to index supplied records, and these are presented via the frontend using the FacetView javascript library. This use of javascript at the front end allows easy embedding of result displays on any web page.

BibSoup and more demonstrations

Our own version of BibServer is up and running at, where we have seen over 100 users sharing more than 14000 records across over 60 collections. Some particularly interesting example collections include: Additionally, we have created some niche instances of BibServer for solving specific problems – for example, check out; here we have used BibServer to analyse and display collections specific to malaria researchers, as a demonstration of the extent of open access materials in the field. Further analysis allowed us to show where best to look for relevant materials that could be expected to be openly available, and to begin work on the concept of an Open Access Index for research. Another example is the German National Bibliography, as provided by the German National Library, which is in progress (as explained by Adrian Pohl and Etienne Posthumus here). We have and are building similar collections for all other national bibliographies that we receive.


At we have produced a simple convention for presenting bibliographic records in JSON. This has seen good uptake so far, with additional use in the JISC TEXTUS project and in Total Impact, amongst others.


Pubcrawler collects bibliographic metadata, via parsers created for particular sites, and we have used it to create collections of articles. The full post provides more information.

datahub collections

We have continued to collect useful bibliographic collections throughout the year, and these along with all others discovered by the community can be found on the datahub in the bibliographic group.

Open Access / Bibliography advocacy videos and presentations

As part of a Sprint in January we recorded videos of the work we were doing and the roles we play in this project and wider biblio promotion; we also made a how-to for using BibServer, including feedback from a new user: Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo. Peter and Tom Murray-Rust’s video, made into a prezi, has proven useful in explaining the basics of the need for Open Bibliography and Open Access:

Community activities

The Open Biblio community have gathered for a number of different reasons over the duration of this project: the project team met in Cambridge and Edinburgh to plan work in Sprints; Edinburgh also played host to a couple of Meet-ups for the wider open community, as did London; and London hosted BiblioHack – a hackathon / workshop for established enthusasiasts as well as new faces, both with and without technical know-how. These events – particularly BiblioHack – attracted people from all over the UK and Europe, and we were pleased that the work we are doing is gaining attention from similar projects world-wide.

Further collaborations


Over the course of this project we have learnt that open source development provides great flexibility and power to do what we need to do, and open access in general frees us from many difficult constraints. There is now a lot of useful information available online for how to do open source and open access. Whilst licensing remains an issue, it becomes clear that making everything publicly and freely available to the fullest extent possible is the simplest solution, causing no further complications down the line. See the open definition as well as our principles for more information. We discovered during the BibJSON spec development that it must be clear whether a specification is centrally controlled, or more of a communal agreement on use. There are advantages and disadvantages to each method, however they are not compatible – although one may become the other. We took the communal agreement approach, as we found that in the early stages there was more value in exposing the spec to people as widely and openly as possible than in maintaining close control. Moving to a close control format requires specific and ongoing commitment. Community building remains tricky and somewhat serendipitous. Just as word-of-mouth can enhance reputation, failure of certain communities can detrimentally impact other parts of the project. Again, the best solution is to ensure everything is as open as possible from the outset, thereby reducing the impact of any one particular failure.

Opportunities and Possibilities

Over the two years, the concept of open bibliography has gone from requiring justification to being an expectation; the value of making this metadata openly available to the public is now obvious, and getting such access is no longer so difficult; where access is not yet available, many groups are now moving toward making it available. And of course, there are now plenty tools to make good use of available metadata. Future opportunities now lie in the more general field of Open Scholarship, where a default of Open Bibliography can be leveraged to great effect. For example, recent Open Access mandates by many UK funding councils (eg Finch Report) could be backed up by investigative checks on the accessibility of research outputs, supporting provision of an open access corpus of scholarly material. We intend now to continue work in this wider context, and we will soon publicise our more specific ideas; we would appreciate contact with other groups interested in working further in this area.

Further information

For the original project overview, see; also, a full chronological listing of all our project posts is available at The work package descriptions are available at, and links to posts relevant to each work package over the course of the project follow:
  • WP1 Participation with Discovery programme
  • WP2 Collaborate with partners to develop social and technical interoperability
  • WP3 Open Bibliography advocacy
  • WP4 Community support
  • WP5 Data acquisition
  • WP6 Software development
  • WP7 Beta deployment
  • WP8 Disruptive innovation
  • WP9 Project management (NB all posts about the project are relevant to this WP)
  • WP10 Preparation for service delivery
All software developed during this project is available on open source licence. All the data that was released during this project fell under OKD compliant licenses such as PDDL or CC0, depending on that chosen by the publisher. The content of our site is licensed under a Creative Commons Attribution 3.0 License (all jurisdictions). The project team would like to thank supporting staff at the Open Knowledge Foundation and Cambridge University Library, the OKF Open Bibliography working group and Open Access working group, Neil Wilson and the team at the British Library, and Andy McGregor and the rest of the team at JISC.


- July 9, 2012 in Bibliographic, DM2E, Events, OKF Projects, Open GLAM, Our Work, Sprint / Hackday, TEXTUS, WG Open Bibliographic Data, Working Groups, Workshop

DSC02202 Last month we ran the Open Knowledge Foundation’s largest celebration of open bibliographic data to date. The main focus of the two-day event was to get some hacking done and use the tools the Open Knowledge Foundation has helped to build, or is currently building, for working with bibliographic data, such as BibServer, TEXTUS and BibSoup.

Open GLAM Workshop

CIMG6138 The other component to the two-day event was a one-day workshop for those working in cultural heritage institutions. It included an introduction to some of the basic technical concepts of open data such as APIs and Linked Data, as well as advice from experts in the field on how to prepare your data for a hackathon. The workshop also sought to start conversations with the institutions represented from around London about what the challenges were to opening up more of their collections online and how the Open Knowledge Foundation’s Open GLAM initiative could assist in the process. The write up of the workshop can be found on and over on the Talis Systems website (thank you Tim Hodson!) One highlight of the workshop was Harry Harrold’s brilliant talk on how to get your data ready for a hackathon:
Bibliohack: Preparing your data for a hackathon from UKOLN on Vimeo.

The Hacking

The hacking began with an agreed approach of identifying one unified problem and established the need to create ‘A Bibliographic Toolkit’: bringing together the tools necessary to liberate bibliographic data, make it openly available on the net and to interact with that data. The main components to this were:
  • Utilising BibServer – adding datasets and using PubCrawler
  • Creating an Open Access Index
  • Developing annotation tools
Project diagram Groups identified particular Open Knowledge Foundation projects including TEXTUS and BibServer to find out what they could offer as part of this Toolkit, and looked into other available facilities on the web. It was so exciting so see people approaching common problems from different angles and finding new ways around problems. One example of this was the TEXTUS group’s new approach to managing bibliographic references and how it can complement approaches to semantic annotation currently being worked on by the DM2E team who were present at the hack. Adrian Pohl and Etienne Posthumus’s attempt to load the whole of German National Bibliography into a Bibserver was another such example. For some more detailed information on what occurred each day, check out the daily blog reports we wrote over on

Big Thanks

We’d like to thank all the groups involved who made the two days such a success, especially DevCSI, UK Discovery DM2E, Open GLAM, Open Biblio and all of the participants. The OKFN frequently arranges workshops, hackdays and meet-ups, so do keep an eye on this blog and meet-up channel for news of upcoming events.

The Right to Read Is the Right to Mine

- June 1, 2012 in Bibliographic, OKF Projects, Open Access, Open Content, Open Data, Open Science, texts, WG Open Bibliographic Data, Working Groups

The following is a draft content mining declaration developed by the Open Knowledge Foundation’s Working Group on Open Access In brief: The Right to Read Is the Right to Mine


Researchers can find and read papers online, rather than having to manually track down print copies.  Machines  (computers) can index the papers and extract the details (titles,  keywords etc.) in order to alert scientists to relevant material.  In addition, computers can extract factual data and meaning by “mining” the content, opening  up the possibility that machines could be used to make connections (and  even scientific discoveries) that might otherwise remain invisible to  researchers. However,  it is not generally possible today for computers to mine the content in papers due to constraints imposed by publishers.  While Open Access (OA) is improving the ability for researchers to read papers (by removing  access barriers), still only around 20% of scholarly papers are OA. The  remainder are locked  behind paywalls. As per the vast majority of subscription contracts, Subscribers may read paywalled papers, but they may not mine them. Content  mining is the way that modern technology locates digital information. Because digitized scientific information comes from hundreds of  thousands of different sources in today’s globally connected scientific  community [2] and because current data sets can be measured in  terabytes,[1] it is often no longer possible to simply read a scholarly  summary in order to make scientifically significant use of such  information.[3]  A researcher must be able to copy information,  recombine it with other data and otherwise “re-use” it so as to produce  truly helpful results.  Not only is it a deductive tool to analyze  research data, it is how search engines operate to allow discovery of content. To prevent mining is therefore to force scientists into blind  alleys and silos where only limited knowledge is accessible.  Science  does not progress if it cannot incorporate the most recent findings and  move forward from there.


‘Open  Content Mining’ means the unrestricted right of subscribers to extract,  process and republish content manually or by machine in whatever form  (text, diagrams, images, data, audio, video, etc.) without prior  specific permissions and subject only to community norms of responsible  behaviour in the electronic age.
  • Text
  • Numbers
  • Tables: numerical representations of a fact
  • Diagrams (line drawings, graphs, spectra, networks, etc.): Graphical  representations of relationships between variables, are images and  therefore may not be, when considered as a collective entity, data.  However, the individual data points underlying a graph, similar to  tables, should be.
  • Images and video (mainly photographic)- where it is the means of expressing a fact?
  • Audio: same as images – where it is expresses the factual representation of the research?
  • XML:  Extensible Markup Language (XML) defines rules for encoding documents  in a format that is both human-readable and machine-readable.”<
  • Core  bibliographic data: described as “data which is necessary to identify  and / or discover a publication” and defined under the Open Bibliography  Principles.
  • Resource  Description Framework (RDF): information about content, such as  authors, licensing information and the unique identifier for the article


Principle 1: Right of Legitimate Accessors to Mine

We assert that there is no legal, ethical or moral reason to refuse to  allow legitimate accessors of research content (OA or otherwise) to use  machines to analyse the published output of the research community.   Researchers expect to access and process the full content of the research literature with their computer programs and should be able to use their machines as they use their eyes. The right to read is the right to mine

Principle 2: Lightweight Processing Terms and Conditions

Mining  by legitimate subscribers should not be prohibited by contractual or  other legal barriers.  Publishers should add clarifying language in  subscription agreements that content is available for information mining by download or by remote access.  Where access is through researcher-provided tools, no further cost should be required. Users and providers should encourage machine processing

Principle 3: Use

Researchers can and will publish facts and excerpts which they discover by reading and processing documents.  They expect to disseminate and aggregate statistical results as facts and context text as fair use excerpts, openly and with no restrictions other than attribution. Publisher  efforts to claim rights in the results of mining further retard the advancement of science by making those results less available to the research community; Such claims should be prohibited. Facts don’t belong to anyone.


We plan to assert the above rights by:
  • Educating  researchers and librarians about the potential of content mining and the current impediments to doing so, including alerting librarians to the need not to cede any of the above rights when signing contracts with  publishers
  • Compiling  a list of publishers and indicating what rights they currently permit,  in order to highlight the gap between the rights here being asserted and  what is currently possible
  • Urging governments and funders to promote and aid the enjoyment of the above rights
[1]  Panzer-Steindel, Bernd, Sizing and Costing of the CERN T0 center, CERN-LCG-PEB-2004-21, 09 June 2004, at [2]  The Value and Benefits of Text Mining, JISC, Report Doc #811, March 2012, Section 3.3.8 at,  citing P.J.Herron, “Text Mining Adoption for Pharmacogenomics-based  Drug Discovery in a Large Pharmaceutical Company: a Case STudy,”  Library, 2006, claiming that text mining tools evaluated 50,000 patents  in 18 months, a task that would have taken 50 person years to manually.
[3] See MEDLINE® Citation Counts by Year of Publication, at and National Science Foundation, Science and Engineering Indicators: 2010, Chapter 5 at asserting the annual volume of scientific journal articles published is on the order of 2.5%.

#OpenDataEDB 2: 16th May

- May 11, 2012 in Bibliographic, Events, Meetups, OKScotland, Talks, WG Open Bibliographic Data

Following the fun we had at March’s Meet-up ‘launch’, we will be having another gathering of people interested in open data next Wednesday 16th May. Hosted by the Wash Bar, Edinburgh, from 19.00, come and join us to discuss ideas, projects and plans in relation to openness. Lightning Talks will include Federico Sangati on crowdsourcing and education, ahead of his presentation at Dev8ed later this month, and a sneak preview of the hackathon that Open Biblio will be running 12-14th June in collaboration with OKFN’s Open GLAM and Cultural Heritage Working Group and DevCSI. If you would like to give a lightning talk (informal 2-3 minute presentations) about anything related to open data or knowledge, contact naomi.lillie [@] Sign up here and we’ll see you there!

Sticker Design 1

For this and other events in Edinburgh and the rest of Scotland, sign up here.

Hackathon alert: BiblioHack!

- May 9, 2012 in Bibliographic, DM2E, Events, Featured, OKF Projects, Open GLAM, Sprint / Hackday, WG Cultural Heritage, WG Open Bibliographic Data, Working Groups, Workshop

The Open Knowledge Foundation’s Open Biblio group, and Working Group on Open Data in Cultural Heritage, along with DevCSI, present BiblioHack: an open Hackathon to kick-start the summer months. From Wednesday 13th – Thursday 14th June, we’ll be meeting at Queen Mary, University of London, East London, and any budding hackers are welcome, along with anyone interested in opening up metadata and the open cause – this free event aims to bring together software developers, project managers, librarians and experts in the area of Open Bibliographic Data. A workshop will run alongside the coding on the 13th, and a meet-up on the evening of the 12th is open to all whether you’re attending the Hackathon or not.

What is BiblioHack?

BiblioHack will be two days of hacking and sharing ideas about open bibliographic metadata. There will be opportunities to hack on open bibliographic datasets and experiment with new prototypes and tools. The focus will be on building things and improving existing systems that enable people and institutions to get the most of bibliographic data. If you’re a non-coder there are sessions for you too. We will be running a hands-on workshop addressing the technical aspects to opening up cultural heritage data looking at best of breed open source tools for doing that, preparing your data for a hackathon and the best standards for storing and exposing your data to make it more easily re-used.

When and where?

  • The main hackathon will take place over two days between 13th and 14th June at Queen Mary University of London
  • On the morning of the 13th June we’ll be running the workshop addressed at the technical challenges to opening up metadata. So for those unable to participate in the hack due to time constraints or lack of coding know how – this is for you!
  • On the 12th June – Tuesday evening (details TBC but will be a pub in central / east London!) – we’ll also be hosting a meet-up for anyone attending the hack and open data more generally. Whether it’s open bibliographic data, spending or government data that floats your boat all tribes are welcome!

Who is organising the event?

Who else is involved?

We’ve already lined up a whole host of speakers and groups who’ll be attending both the hack and the workshop. The list so far includes UK Discovery, CKAN, Europeana, Total Impact, Neontribe, The British Library with many more to be added in the coming days…

You’re giving your time and expertise – what do you get if you attend the whole hack?

  • Accommodation at QMUL overnight on the 13th
  • Food and drink across the 3 days
  • The chance to work with experts in their fields
  • Admiration and respect from your peers
  • We could expound at length, but… go on, you know you want to (it’s free!)

How can I sign up?

  • Register here for the 2 day hack
  • Register here for workshop only
  • Register here for Meet-up only
Please note, if you wish to attend all 3 events you should sign up for each, and the Workshop will run in parallel with the hacking on the morning of the 13th.

More questions?

Contact Naomi Lillie on admin [@] See you there!

Announcing DM2E: Exploring the possibilities of Linked Open Data in cultural heritage

- March 19, 2012 in DM2E, Featured, Our Work, WG Cultural Heritage, WG Humanities, WG Open Bibliographic Data

The Open Knowledge Foundation is delighted to announce that it will be leading the community work for a three-year EU funded project entitled Digitised Manuscripts to Europena (DM2E). The project consortium, which includes academic institutions, NGOs and commercial partners, will be led by Professor Stefan Gradmann at the Humboldt University.


The project aims to enable as many of Europe’s memory institutions to easily upload their digital content into Europeana.

Europeana is Europe’s largest cultural heritage portal, giving access to millions of digital artefacts contributed by over 2000 cultural heritage institutions across Europe. Founded in 2008, Europeana offers access to Europe’s history to all citizens with an internet connection. Not only does Europeana hold a huge amount of promise for researchers and scholars who benefit immensely from having access to huge aggregated datasets about cultural heritage objects, but through the use of APIs Europeana promises to stimulate the development of a swathe of apps and tools with applications in tourism and education.

Open GLAM (Galleries, Libraries, Archives, Museums)

As part of DM2E, the Open Knowledge Foundation will be continuing to work closely with cultural institutions from all over Europe encouraging them to openly license their metadata. Metadata that is contributed to content aggregation platforms like Europeana is most valuable if it is openly licensed, maximising the number of applications it can have. The Open Knowledge Foundation’s Open Bibliographical Principles are the expression of the ideas we seeks to realise in this field. Last year, the team at Europeana announced their new Data Exchange Agreement which stipulates that metadata must be provided to Europeana under the Creative Commons Public Domain License (CC-0). This is a significant step towards the goal of achieving an open cultural heritage data ecosystem that extends access to all, and encourages the reuse of cultural data in a whole variety of novel contexts both commercial and non-commercial. The Open Knowledge Foundation’s Open GLAM work will be key in this respect. We will be teaming up with the likes of Wikimedia, Creative Commons and UK Discovery to run open licensing clinics and technical workshops for librarians and archivists all over Europe in order to demystify some of the legal issues around open metadata, and also to showcase projects that build upon openly licensed content to show just what is possible when you free your metadata! The next workshop in this strand will be held at the Staatsbibliothek zu Berlin on April 20th and it will be co-hosted with Wikimedia Germany. Watch this space for more details!

Linked Open Data in cultural heritage

One of the core aspirations of DM2E is to leverage the tremendous potential offered by Linked Data technologies such as RDF to create a network of interconnected and linked cultural datasets. To have cultural heritage data in Linked Data formats will enable the automated enrichment of metadata provided to Europeana. For instance, any metadata fields about authors of books will be linked to the giant DBPedia datasets, thus supplying more information about the life of that particular author, ultimately enriching the original metadata record. The important task of building a tool that will translate “flat” (non-linked) data from cultural heritage institutions into RDF falls to the Freie Universität Berlin. They will develop technology that can take a diverse range of metadata types as its source, and turn them into the Linked Data that aligns with the Europeana Data Model (EDM). For any of you who want to brush up on just what Linked Data is and why it is relevant to cultural heritage, the folk at Europeana made a wonderful video explaining it all recently:

Engaging researchers

But DM2E is not only about enabling more archives and libraries to provide linked open metadata to Europeana, it’s also about working with research communities who will consume the aggregated Linked Data on Europeana. The Italian company Net7 will be leading work on tools that will help scholars from the humanities to work with this data. Tools for semantic annotation and building collections of texts on which complex analysis can be formed will be key.

Key links

#OpenDataEDB: the results

- March 16, 2012 in Bibliographic, Events, Meetups, OKF, OKScotland, Open Data, Open GLAM, Open Knowledge, Open Science, Talks, WG Open Bibliographic Data

Last night was the first OKFN Meet-Up in Scotland* at the Ghillie Dhu, Edinburgh, run in collaboration with DevCSI. 19 people attended from around the city and nearby, including Glasgow, and those visiting for the Open Biblio Sprint represented Cambridge, London, Wolverhampton and the Netherlands. The Auditorium was a beautiful venue, and there was a good space for giving presentations complete with seamless audio and visual equipment (a rare treat!). IMG_0315 We kicked off with the first three Lightening Talks:
It was great to see people gravitating towards those whose presentations had struck a chord… Mahendra had invited discussion around potential events and many people had plans or ideas which they wanted to run past him, while Rod’s points on taxonomy were pertinent to Mark’s work on BibServer as well as others’ research. Other discussions grew between the bar snacks, as people began with the standard ‘what do you do?’ and swiftly developed into ‘oh that’s funny, I was talking to so-and-so about that just now…’ Our dedicated bartender was contributing too, as he specialised in nanotechnology! The next three talks followed: The hubbub of enthusiasm started up again, and it appeared there were good conversations and connections emerging around the room. From these, or perhaps just courage from having seen others do their presentations (and me fumbling along as make-shift compère), two additional people decided to give impromptu talks: Many thanks to all those who presented and to those who attended to discuss all things #OpenData. Hopefully everyone left with good ideas of topics and people to follow up with afterwards, and who knows where these will lead? IMG_0306 As this was our first Scotland-based Meet-up we’d be glad to get feedback so we can improve; the next one is planned for May, so if you have anything you’d particularly like to see, hear or say, let us know (one suggestion was that talks are recorded, so people unable to attend can keep up-to-date). This and other events will be promoted via the OKFN Scotland List, so do sign up here otherwise you might miss out!

* It turns out there was an event in Scotland in 2010, according to people who have been on the scene longer than I… see here for comments on the Open Biblio blog post which highlight previous activity, and many thanks to the people who kindly contributed this information. Here’s to the next one :-)

Scotland’s first Meet-up is next Tuesday!

- March 7, 2012 in Events, Meetups, Our Work, WG Open Bibliographic Data

Interested in Open Knowledge? Want to meet others who are? …Look no further! OKFN and DevCSI are arranging the first Meet-up here in Edinburgh, with the Open Biblio project team taking the helm. OKFN Meet-ups are friendly and informal evenings for people to get together and talk about open data. London and Cambridge have had huge successes, with like-minded people getting together to share, discuss and argue all elements of openness, and now it’s time for Edinburgh to join the party. Top venue Ghillie Dhu is hosting us next Tuesday, 13th March from 7pm and we will be joined by representatives of other projects including Textus, the School of Open Data and DevCSI. IMGP4467 We already have guest speakers lined-up to do lightening talks (informal 2-3 minute presentations) on the Public Domain Review, the Open Data Handbook and CKAN, as well as Open Biblio; if you would like to talk about a particular area of Open Data or related subjects – whether a project you’re involved with already or your aspirations for the future of openness – then get in touch! naomi.lillie [@] If you are interested in seeing what we’re up to, or talking open data / knowledge in general, do come along: all are welcome. This promises to be a great opportunity for some Edinburgh-based folk (and anyone willing to travel!) to get together to discuss ideas, projects and generally set the world to rights over a brew. So, what do you need to do?
  • Sign up
  • Tweet #OpenDataEDB
  • Contact naomi.lillie [@] for more information
  • Come along!
This event is run by Open Biblio, OKFN and DevCSI.