You are browsing the archive for openbiblio.

JISC Open Biblio 2 project – final report

- August 23, 2012 in Bibliographic, jiscopenbiblio2, OKF Projects, Open GLAM, openbiblio, WG Open Bibliographic Data, Working Groups

This is cross-posted from Following on from the success of the first JISC Open Bibliography project we have now completed a further year of development and advocacy as part of the JISC Discovery programme. Our stated aims at the beginning of the second year of development were to show our community (namely all those interested in furthering the cause of Open via bibliographic data, including: coders; academics; those with interest in supporting Galleries, Libraries, Archives and Museums; etc) what we are missing if we do not commit to Open Bibliography, and to show that Open Bibliography is a fundamental requirement of a community committed to discovery and dissemination of ideas. We intended to do this by demonstrating the value of carefully managed metadata collections of particular interest to individuals and small groups, thus realising the potential of the open access to large collections of metadata we now enjoy. We have been successful overall in achieving our aims, and we present here a summary of our output to date (it may be useful to refer to this guide to terms).


BibServer and FacetView

The BibServer open source software package enables individuals and small groups to present their bibliographic collections easily online. BibServer utilises elasticsearch in the background to index supplied records, and these are presented via the frontend using the FacetView javascript library. This use of javascript at the front end allows easy embedding of result displays on any web page.

BibSoup and more demonstrations

Our own version of BibServer is up and running at, where we have seen over 100 users sharing more than 14000 records across over 60 collections. Some particularly interesting example collections include: Additionally, we have created some niche instances of BibServer for solving specific problems – for example, check out; here we have used BibServer to analyse and display collections specific to malaria researchers, as a demonstration of the extent of open access materials in the field. Further analysis allowed us to show where best to look for relevant materials that could be expected to be openly available, and to begin work on the concept of an Open Access Index for research. Another example is the German National Bibliography, as provided by the German National Library, which is in progress (as explained by Adrian Pohl and Etienne Posthumus here). We have and are building similar collections for all other national bibliographies that we receive.


At we have produced a simple convention for presenting bibliographic records in JSON. This has seen good uptake so far, with additional use in the JISC TEXTUS project and in Total Impact, amongst others.


Pubcrawler collects bibliographic metadata, via parsers created for particular sites, and we have used it to create collections of articles. The full post provides more information.

datahub collections

We have continued to collect useful bibliographic collections throughout the year, and these along with all others discovered by the community can be found on the datahub in the bibliographic group.

Open Access / Bibliography advocacy videos and presentations

As part of a Sprint in January we recorded videos of the work we were doing and the roles we play in this project and wider biblio promotion; we also made a how-to for using BibServer, including feedback from a new user: Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo. Peter and Tom Murray-Rust’s video, made into a prezi, has proven useful in explaining the basics of the need for Open Bibliography and Open Access:

Community activities

The Open Biblio community have gathered for a number of different reasons over the duration of this project: the project team met in Cambridge and Edinburgh to plan work in Sprints; Edinburgh also played host to a couple of Meet-ups for the wider open community, as did London; and London hosted BiblioHack – a hackathon / workshop for established enthusasiasts as well as new faces, both with and without technical know-how. These events – particularly BiblioHack – attracted people from all over the UK and Europe, and we were pleased that the work we are doing is gaining attention from similar projects world-wide.

Further collaborations


Over the course of this project we have learnt that open source development provides great flexibility and power to do what we need to do, and open access in general frees us from many difficult constraints. There is now a lot of useful information available online for how to do open source and open access. Whilst licensing remains an issue, it becomes clear that making everything publicly and freely available to the fullest extent possible is the simplest solution, causing no further complications down the line. See the open definition as well as our principles for more information. We discovered during the BibJSON spec development that it must be clear whether a specification is centrally controlled, or more of a communal agreement on use. There are advantages and disadvantages to each method, however they are not compatible – although one may become the other. We took the communal agreement approach, as we found that in the early stages there was more value in exposing the spec to people as widely and openly as possible than in maintaining close control. Moving to a close control format requires specific and ongoing commitment. Community building remains tricky and somewhat serendipitous. Just as word-of-mouth can enhance reputation, failure of certain communities can detrimentally impact other parts of the project. Again, the best solution is to ensure everything is as open as possible from the outset, thereby reducing the impact of any one particular failure.

Opportunities and Possibilities

Over the two years, the concept of open bibliography has gone from requiring justification to being an expectation; the value of making this metadata openly available to the public is now obvious, and getting such access is no longer so difficult; where access is not yet available, many groups are now moving toward making it available. And of course, there are now plenty tools to make good use of available metadata. Future opportunities now lie in the more general field of Open Scholarship, where a default of Open Bibliography can be leveraged to great effect. For example, recent Open Access mandates by many UK funding councils (eg Finch Report) could be backed up by investigative checks on the accessibility of research outputs, supporting provision of an open access corpus of scholarly material. We intend now to continue work in this wider context, and we will soon publicise our more specific ideas; we would appreciate contact with other groups interested in working further in this area.

Further information

For the original project overview, see; also, a full chronological listing of all our project posts is available at The work package descriptions are available at, and links to posts relevant to each work package over the course of the project follow:
  • WP1 Participation with Discovery programme
  • WP2 Collaborate with partners to develop social and technical interoperability
  • WP3 Open Bibliography advocacy
  • WP4 Community support
  • WP5 Data acquisition
  • WP6 Software development
  • WP7 Beta deployment
  • WP8 Disruptive innovation
  • WP9 Project management (NB all posts about the project are relevant to this WP)
  • WP10 Preparation for service delivery
All software developed during this project is available on open source licence. All the data that was released during this project fell under OKD compliant licenses such as PDDL or CC0, depending on that chosen by the publisher. The content of our site is licensed under a Creative Commons Attribution 3.0 License (all jurisdictions). The project team would like to thank supporting staff at the Open Knowledge Foundation and Cambridge University Library, the OKF Open Bibliography working group and Open Access working group, Neil Wilson and the team at the British Library, and Andy McGregor and the rest of the team at JISC.


- February 21, 2012 in BibServer, communityBenefits, jisc, JISC OpenBib, jiscopenbib2, OKFN Openbiblio, openbiblio, progress, projectPlan, wp2, wp9

There have been requests on our mailing list recently to consider the various options for supporting validation of BibJSON and for supporting namespacing. These two options require some further consideration.


Efforts so far around BibJSON have focussed on building a useful JSON representation of bibliographic metadata, with some typical key/value pairs that are common in or extended from bibtex. This started off simply, but we have seen increasing complexity to accommodate further functionality requests. There was some work on a JSON schema for validation against, but given the aim of being as flexible as possible, and with very few required keys, the function of validation of a BibJSON document would have very little effect. Validating a document as properly formatted JSON is, of course, a good idea; but there are plenty ways to do this already – just try to parse it with any number of libraries for your programming language of choice. But to reach the stage of actually supporting validation against a pre-defined schema, we must pre-define a schema – and that means becoming inflexible (or doing such little validation as for it to be essentially pointless). An alternative to validation against a schema would be adoption of namespaces.


We do already have a namespace concept in BibJSON – it is just a key in the metadata, under which can be listed namespaces and a suitable prefix for them. However, this model is not widely known (because we made it up). To overcome this, we should adopt the JSON-LD method of using @context parameters. This way, it would be possible to specify the namespace in which your record keys are defined, and to share namespace information with other people / machines.

What is the point

Using namespaces, having schema, only become sensible when there is a concerted effort to share data with others. For internal use, they could be valuable for consistency, but the code we write internally adheres by definition to our own level of consistency anyway. Therefore, it is not a function of BibJSON to perform validation – BibJSON is just JSON. Rather, it is the function of a community to make agreements and to conform to those agreements as required. Where such a function must be supported, it should be done via mechanisms already available and maintained for that purpose – there is no point attempting to maintain our own; it is not our key strength or goal.


Change the BibJSON use of namespaces to conform to the method specified in JSON-LD, and that wherever consistency is required, agreement to share data via JSON and within a particular @context should be reached. The fundamental basic keys in BibJSON – the default context – should remain as they are, and should not require contextualisation. If contextualisation of the fundamental keys of BibJSON is required, then those keys should be contextualised into a schema by whomsoever has such a requirement.


  • drop the “namespace” key in BibJSON
  • continue using BibJSON as normal, but:
  • reference JSON-LD for use of @context and other more complex LD functions as required
  • wherever validation is required, perform it based on the use of namespaced keys (beyond scope of bibjson)


Finnish Turku City Library and the Vaski consortia now Open Data with 1.8M MARC-records

- October 13, 2011 in Bibliographic, Data, finland, Guest post, licensing, OKFN Openbiblio, openbiblio, value

Let's open up our metadata containers

I’m happy to announce that our Vaski-consortia of public libraries  serving total 300 000 citizens in Turku and the a dozen surrounding municipalities in western Finland, have recently published all of our 1.8 million bibliographical records in the open, as a big pile of data (see on The Data Hub). Each of the records describes a book, recording, movie, song or other publication in our library catalogue. Titles, authors, publishing details, library classifications, subject headings, identifiers and so on systematically saved in MARC -format, the international, structured library metadata standard since the late 1960s. Unless I’ve missed something, ours is the third large scale Open Data -publication from the libraries of Finland. The first one was the 670 000 bibliographical records of HelMet-consortia (see on The Data Hub), an another consortia of public libraries around the capital Helsinki. This first publication was organized and initiated in 2010 by Labs, a project seeking for more agile, innovative library concepts. The second important Open Data publication was our national generic theseurus Yleinen suomalainen asiasanasto YSA which is also available as a cool semantic ontology. Joining this group of Open Data publications was natural for our Vaski-consortia, because we are moving our data from one place to another anyway; we are in the middle of the process of converting from our national FinMARC -flavour to the international MARC21 -flavour of MARC, swapping our library system from Axiell PallasPro to Axiell Aurora, plus implementing a new, ambitious search and discovery interface for all the Finnish libraries, archives and museums (yes, it’s busy times here and we love the taste of a little danger). All this means we are extracting, injecting, converting, mangling, breaking, fixing, disassembling and reassembling all of our data. So, we asked ourselves, why not publish all of our bibliographical data on the net while we are on it? The process of going Open Data has been quite seamless for us. On my initiative the core concept of Open Data was explained to the consortia’s board. As there were no objections or further questions, we contacted our vendor BTJ who immidiately were supporting the idea. From there on it was basically just about some formalities with BTJ, consulting international colleagues regarding licensing, writing a little press-release, organizing a few hundred megabytes of storage space on the internet. And trying to make sure the Open Data -move didn’t get buried under other, more practical things during the summertime. For our data license we have chosen the liberal Creative Commons-0 license (CC0), because we try to have as little obstructions to our data as possible. However we have agreed on a 6 month embarko with BTJ, a company who is doing most of the cataloguing for the Finnish public libraries. We believe that it is a good compromise to prefer publishing data that is slightly outdated, than try to make the realm of immaterial property rights any more unclear than it already is.

Traditional library metadata at Turku main library

We seriously cannot anticipate what our Open Data -publication will lead to. Perhaps it will lead to absolutely nothing at all. I believe most organizations opening up their data face this uncertainty. However what we do know for sure is, that all of the catalogue records we have carefully crafted, acquired and collected, are seriously underutilized if they are only used for one particular purpose: finding and locating items in the library collections. For such a valuable assett as our bibliographical metadata, I feel this is not enough. By removing obstacles for accessing our raw data, we open up new possibilities for ourselves, for our colleagues (understood widely), and to anybody interested. Mace Ojala, project designer Turku City Library/Vaski-consortia; National Digital Library of Finland, Cycling for libraries, etc., @xmacex, Facebook etc.

#openbiblio Italia – seconda puntata

- June 22, 2011 in openbiblio

Il 1 giugno si è svolta la seconda riunione del gruppo di lavoro sui dati bibliografici (la prima riunione è stata relazionata sempre su questo blog). Abbiamo accolto Antonella De Robbio che ha condotto la nostra discussione su aspetti più concreti, grazie alla sua grande esperienza.

Fronti di azione

Tra i possibili fronti di azione per la promozione di #openbiblio in Italia abbiamo discusso i seguenti:
  • mondo della ricerca scientifica
  • open government data e trasparenza amministrativa
  • dati bibliografici aperti: cataloghi e banche dati bibliografiche composte essenzialmente di metadati
Questi ultimi sono in pratica tutti proprietari in Italia, e vige una grande confusione (o disinteresse) su chi siano i titolari dei diritti sui dati bibliografici degli OPAC italiani. Gli OPAC in Italia sono oltre 1200 e contengono bibliodati. È evidente la necessità di una volontà politica di condivisione, poiché siamo in presenza di banche dati frutto di un lavoro collettivo di decenni. Al tempo stesso è necessaria una base di consapevolezza sul fatto che i diritti sono in capo ai decisori, che devono sapere cosa farne: dobbiamo informare.


Il catalogo più importante è certamente il Sistema Bibliotecario Nazionale (SBN), gestito dall’Istituto Centrale per il Catalogo Unico (ICCU) del Ministero per i Beni e le Attività Culturali. SBN è il più grande OPAC italiano e collega migliaia di biblioteche in poli e aggregazioni. Su questo specifico caso è possibile che il movimento per gli open government data abbia creato le condizioni per l’apertura. È una disponibilità che va verificata, soprattutto nell’ottica dell’adozione di una licenza CC0 o PDDL ‒ cioè le licenze raccomandate dai principi internazionali per i dati bibliografici aperti e adottate da tutti i casi di apertura.


Molto diversa è la situazione dei cataloghi universitari, che per loro natura sono decentrati, e hanno una situazione generalmente non chiara sulla titolarità dei diritti. 34 università italiane utilizzano il software proprietario Aleph. Se da un lato questo può essere visto come un problema di interoperabilità, dall’altro è opportuno ricordare che l’apertura dei dati non dipende da nessun software specifico (esistono standard internazionali per i dati bibliografici che ogni software è in grado di gestire ed esportare). Inoltre, Ex Libris, la casa produttrice di Aleph, ha reiterato la totale separazione del proprio software dai dati che esso gestisce: sta alle biblioteche gestirli in autonomia, e decidere sul loro rilascio. Esistono purtroppo situazioni documentate di cattiva gestione, in cui per una serie di fattori le banche dati sono “prigioniere” del sistema di gestione, e crediamo che il movimento #openbiblio abbia un valore importante anche per migliorare queste situazioni e garantire la conservazione dei dati bibliografici per la collettività. È comunque indubbio che l’adozione di software libero renderebbe più agevole l’apertura dei dati e la sua sostenibilità a lungo termine. Nella prossima puntata parleremo più nel dettaglio di:
  • licenze
  • linked open data
Chi è interessato a partecipare può indicare il proprio nome e contatto su

Openbiblio at #elag2011 and #lodlam

- June 1, 2011 in elag2011, event, lodlam, openbiblio

I wrote a post over at the blog for the LOD-LAM (Linked Open Data in Libraries, Archives and Museums) summit. It’s mainly a summary of the ELAG 2011 from an openbiblio viewpoint. See for the post. Also, the German Zukunftswerkstatt published an interview podcast regarding Open Bibliographic Data. Julia Bergman interviewed Patrick Danowski, Kai Eckert and me at the German barcamp for librarians and other hackers BibCamp. Hopefully, a text version of this interview will also be published on the web soon.

#openbiblio Italia – prima puntata

- May 30, 2011 in openbiblio

La scorsa settimana si è svolta la prima riunione di un piccolo gruppo di persone interessate a discutere il tema #openbiblio in Italia. Perché non ci sono solo gli Open Government Data.

Chi c’era?

Eravamo in 4, ben assortiti per età, provenienza e occupazione:
  • Andrea Zanni, segretario di Wikimedia Italia e impiegato presso Alma DL, Università di Bologna
  • Francesca Di Donato, Università di Pisa e Associazione Linked Open Data
  • Karen Coyle, bibliotecaria all’Università di Berkeley, membro del gruppo di lavoro internazionale OKFN sui dati bibliografici
  • Stefano Costa, archeologo, dottorando all’Università di Siena e coordinatore italiano di Open Knowledge Foundation

Di cosa abbiamo parlato?

Karen Coyle ha introdotto le attività del gruppo internazionale, tra cui: Francesca Di Donato ha spiegato che in Italia il movimento Open Access (OAI) ha un forte radicamento nel mondo delle biblioteche, quindi è opportuno creare un collegamento con questo movimento. Fortunatamente, Antonella De Robbio (Università di Padova, AIB, E-LIS) ha già dato la sua disponibilità a partecipare al prossimo incontro. Non è facile capire quali potrebbero essere gli ostacoli alla liberazione dei dati bibliografici, tenuto conto che, per esempio, Ex Libris, la software house che vende il diffuso software Aleph, ha dichiarato che le biblioteche sono padrone dei propri dati. Al tempo stesso, non è scontato che questa idea venga accolta ovunque con favore. Altro tema particolarmente importante è quello della titolarità dei diritti sulle banche dati bibliografiche. In altre parole, a quale porta bisogna bussare per chiedere la liberazione di questi dati? In molti casi, l’esistenza di poli bibliotecari a livello provinciale o regionale comporta una dimensione considerevole delle banche dati, per non parlare dei poli SBN che confluiscono a livello nazionale.

Quando c’è la seconda puntata?

Mercoledì 1 giugno parleremo di questo, alle 19:30. Chi vuole partecipare è benvenuto, e può segnare il proprio nome su

Open Biblio Principles Announced

- January 24, 2011 in Bibliographic, News, Open Data, Open Definition, Open Knowledge, openbiblio, WG Open Bibliographic Data

The following post is by Mark McGillivrary, a member of the Open Knowledge Foundation Working Group on Open Bibliographic Data. Last week the Open Biblio Principles were launched by the Open Knowledge Foundation’s Working Group on Open Bibliographic Data. The principles are the product of six months of development and discussion within the working group and the wider bibliographic community:
Producers of bibliographic data such as libraries, publishers, universities, scholars or social reference management communities have an important role in supporting the advance of humanity’s knowledge. For society to reap the full benefits from bibliographic endeavours, it is imperative that bibliographic data be made open — that is available for anyone to use and re-use freely for any purpose.
As this makes clear, the principles have a simple message: make bibliographic data open data as defined by the Specifically, there are 4 core principles:
  1. When publishing bibliographic data make an explicit and robust license statement.
  2. Use a recognized waiver or license that is appropriate for data.
  3. If you want your data to be effectively used and added to by others it should be open as defined by the Open Definition ( – in particular non-commercial and other restrictive clauses should not be used.
  4. Where possible, explicitly place bibliographic data in the Public Domain via PDDL or CC0.
You can read the full version of the principles at: And, perhaps even more importantly, you can endorse them: Please help us spread the word, and the links, to individuals and organisations across the academic, library and publisher community. Lastly, we are also working on alternative language versions so if you are interested in doing a translation please leave a comment or email mark [dot] macgillivrary [at] okfn [dot] org. Related posts:
  1. Launch of the Principles on Open Bibliographic Data
  2. Launch of the Panton Principles for Open Data in Science and ‘Is It Open Data?’ Web Service
  3. New open bibliographic data from Konstanz and Cambridge!

Launch of the Principles on Open Bibliographic Data

- January 18, 2011 in Bibliographic, Open Data, openbiblio, WG Open Bibliographic Data

The following post is from Adrian Pohl, coordinator of the OKFN Working Group on Open Bibliographic Data. Yesterday, the Principles of Open Bibliographic Data were launched at the Peter Murray-Rust symposium “Visions of a (Semantic) Molecular Future”: The principles’ main recommendations read as follows:
  1. When publishing bibliographic data make an explicit and robust license statement.

  2. Use a recognized waiver or license that is appropriate for data.

  3. If you want your data to be effectively used and added to by others it should be open as defined by the Open Definition – in particular non-commercial and other restrictive clauses should not be used.

  4. Where possible, we recommend explicitly placing bibliographic data in the Public Domain via the Public Domain Dedication and License (PDDL) or CC0.

The initial idea for something like the Principles on Open Bibliographic Data dates back to May 2010 and originated in the German OKFN chapter. Originally, they were directed at the library world. It was not before July 2010 that the OKFN Working Group on Open Bibliographic Data started work on the principles - taking ideas (and text) from the Panton Principles for Open Data in Science. Over time, and through Peter Murray-Rust’s and Jim Pitman’s initiative, the principles also adressed the broader spectrum of producers of bibliographic data like scholars and publishers. In addition, a definition of bibliographic data was added in the first part of the document to clarify the principles’ scope. We are delighted that we have been able to create in a relatively short and brief period this short and clear document that will help to do much to promote the cause of open bibliographic data specifically and open knowledge more generally.

Related posts:
  1. Open Biblio Principles Announced
  2. Launch of the Panton Principles for Open Data in Science and ‘Is It Open Data?’ Web Service
  3. Notes from Workshop on Open Bibliographic Data and the Public Domain