You are browsing the archive for OKFN Openbiblio.

Minutes: 28th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

- February 6, 2013 in minutes, OKFN Openbiblio

Date: February, 5th 2013, 16:00 GMT Channels: Meeting was held via Skype and Etherpad


  • Adrian Pohl
  • Karen Coyle
  • Tom Johnson
  • Tom Morris
  • On the Etherpad:
    • Peter Murray-Rust
    • Mark McGillivray


  • As there were two new participants to the meeting (who already engaged in discussions on the mailing list though) attended the meeting everybody introduced themselves. The “new” participants were:
    • Tom Morris: “Tom Morris is the top external data contributor to Freebase and has contributed more than 1.6 million facts. He’s been a member of the Freebase community for several years. When not hacking on Freebase, Tom is an independent software engineering and product management consultant.” (taken from here, shortened and updated
    • Tom Johnson: “Thomas Johnson is Digital Applications Librarian at Oregon State University Libraries, where he works on digital curation, scholarly publication, and related metadata and software issues.

Bibframe and data licensing

  • Adrian started a discussion on the bibframe list, see here.
  • Karen: It isn’t clear to me how BIBFRAME will be documented, and whether that documentation will be sufficient to process data. Note that RDA (the cataloging rules) is not freely available, therefore if BIBFRAME does develop for RDA there may be conflicts relating to text such as term definitions.
    • This adresses licensing of bibframe spec, not the bibliographic data but may be a problem in the future if Bibframe re-uses content from the RDA spec.
  • Tom Morris: Licensing policy seems to be orthogonal to modelling process
  • Conclusion: We’ll wait as a working group and not push the LoC further towards open data.
  • Tom Morris: We should think about lobbying for making the process more open.
  • Tom Morris: German National Library and other early experimenters of bibframe should get up their code on github to bring the development forward

Bibliographic Extension for (schemabibex)

  • See minutes of last meeting for background information.
  • The work is moving forward to create more properties for bibliographic data — but so far not including journal articles
  • Library view point predominates at schemabibex group, scientists’ view point isn’t represented
  • Karen: Somebody from the scientific community should join schemabibex or start seperate effort. <– Maybe people from scholarlyhtml?

NISO Bibliographic Meeting

  • NISO has a grant to hold a meeting of "interested parties" relating to bibliographic data.
  • Goes back to effort of Karen Coyle and another person to include other producers of bibliographic data than libraries (publishers, scientists etc.) in developments of future standards for bibliographic data (like Bibframe).
  • See also the thread on the openbiblio list. tfmorris: As much of the information as possible should be published online.
  • Meeting will be held in March or April in Washington D.C.
  • Interested parties can participate in the initial meeting but there's no/little funding. (See this email for the proposed dates of the meeting.
  • "We are planning to have a live-stream of the event, presuming there is sufficient bandwidth at the meeting site."


  • Peter Murray-Rust wrote before the meeting: "I'd like to run a hackfest (in AU) later this month and make Bib an important aspect. Can we pull together a "hacking kit" for such an even (e.g. examples of BibJSON, some converters, a simple BibSoup, etc."
    • Mark McGillivray responded: "yes: I will write a blog post that explains bibsoup a bit more, and we could use a google spreadsheet for simple collection of records."


  • Tom Morris had two questions regarding BibJSON which and Mark provided some answers on the etherpad.
  • Q: What is being done to promoted adoption?
    • MM says: "_I and others continue to use bibjson and promote it on our projects. it is now being used by the open citations project and there will be updates to soon with further recommendations – mostly around how to specify provenance in a bibjson record. Also we have agreed with crossref for them to output bibjson – it needs some fixes to be correct, but is just about there.
  • Q: What tool support is available? (Mendeley, Zotero, converters, etc)
    • MArk says: "The translators are currently unavailable – they will soon be put up at a separate url for translating files to bibjson which can then be used in bibsoup. Mendeley, Zotero etc can all output bib collections in formats that we can already convert, so there is support in that sense. Separating out the translators will also make it easier for people to implement their own."
  • Tim morris: There's PR value in having BibJSON listed on the
  • Ways of promoting BibJSON:
    • Articles: Tom Johnson published an article on BibJSON application in code4lib journal:
    • Talks: e.g. at code4lib (Tom Johnson will be there and might give a lightning talk mentioning BibJSON.),
    • Adoption: CrossRef would be a great addition. Need more services like Mendeley, Zotero, Open Library, BibSonomy etc. to support BibJSON (input/output)
  • Tom Johnson asks: What is the motivation to provide BibJSON output?

Open Library

  • Speaking about BibJSON adoption we camte to talking about what will happen to the Open Library. Karen gave a short summary of what are the future plans for Open Library:
    • Open Library currently has no assigned staff resources. Open Library is being integrated into the whole Internet Archive system and may cease using the current infogami platform. It isn't clear if the same UI will be available, nor if there will be any further development in terms of features such as APIs.
    • No batches of records (LC books records or Amazon records) have been loaded since mid-2012.
    • Tom Morris is primarily interested in the data and the process to reconcile it etc. but he also emphasizes the value of the brand and the community.
    • Karen: infogami is interesting as a flexible development platform that sits on a triple store:
    • Tom Johnson: What can we do regarding Open Library?
      • Karen: Set up a mirror?
      • Make records for free ebooks available as MARC so that libraries can integrate these into their catalogue. <– Tom Morris would help with that.

Public Domain Books/authors

Minutes: 27th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

- January 10, 2013 in minutes, OKFN Openbiblio

Date: January, 8th 2013, 16:00 GMT Channels: Meeting was held via Skype and Etherpad


  • Adrian Pohl
  • Peter Murray-Rust
  • Richard Wallis


Schemabib Extension group update

  • Links:
  • W3C community and business group, started by Richard Wallis (OCLC) in September 2012
  • Conference meeting once a month
  • Idea: Get consensus across the bibliographic community about how to extend
  • Lightweight approach, should not compete with MARC
  • Most people interested in bibliodata come from the library community. Richard tried to extend the group to other people (publishers, scholars etc.).
  • Background: OCLC publishing Linked Data in using vocabulary. missed properties
  • In the end: Publish extension proposal to the public-vocabs list
  • Peter comments on is going to work because its built by people who know how the web works
  • Currently discussion about the concept of work and instances; FRBR comes up but such a model wouldn’t make it into
  • Richard: It makes sense to publish alongside BibFrame or RDA.
  • Peter: Talking to Mark McGillivray might make sense to find out how bibdata can relate to BibJSON and the accompanying tools.

Bibframe draft data model

GOKb (Global Open Knowledgebase)

Adrian heard about this project but all he could find on the web about it was litte information: “Kuali OLE, one of the largest academic library software collaborations in the United States, and JISC, the UK’s expert on digital technologies for education and research, announce a collaboration that will make data about e-resources—such as publication and licensing information—more easily available. Together, Kuali OLE and JISC will develop an international open data repository that will give academic libraries a broader view of subscribed resources.
The effort, known as the Global Open Knowledgebase (GOKb) project, is funded in part by a $499,000 grant from The Andrew W. Mellon Foundation. North Carolina State University will serve as lead institution for the project.
GOKb will be an open, community-based, international data repository that will provide libraries with publication information about electronic resources. This information will support libraries in providing efficient and effective services to their users and ensure that critical electronic collections are available to their students and researchers.” from is … focused on global-level metadata about e-resources with the goal of supporting management of those e-resources across the resource lifecycle. GOKb does not aspire to replace current vendor-provided KB products. But it does aspire to make good data available to everybody, including existing KBs, and to provide an open and low-barrier way for libraries to access this data. Our goal is that GOKb data is permeates the KB ecosystem so that all library systems, whether ILS, ERM, KB or discovery, will have better quality data about electronic collections than they do today.” From
  • The oparticipants didn’t know much more about this initiative. Adrian will try to find out more for upcoming meetings.


  • Peter briefly informed about some interesting developments: *Open citations: (David Shotton, Oxford, Uk)
    • Hargreaves report: UK government says it’s legal toc mine content. See Peter’s post at [](]
    • Pubcrawler
    • Crossref biblio/citation data

Minutes: 26th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

- November 7, 2012 in minutes, OKFN Openbiblio

Date: November, 6th 2012, 16:00 GMT Channels: Meeting was held via Skype and Etherpad


  • Adrian Pohl
  • Karen Coyle
  • Joris Pekel
  • Jim Pitman


ORCID launched

“ORCID makes its code available under an open source license, and will post an annual public data file under a CCO waiver for free download.” (Source: Open Data
  • ORCID provides annual CC0 dump.
Open API
  • To try the open API point your queries to ! (Documentation says something else)
  • Query biographies example:
    • curl -H ‘Accept: application/orcid+xml’
    • Retrieve bio example: curl -H “Accept: application/orcid+json” “”
Open source Linked Open Data (Much information was taken from this twitter conversation.)
  • Karen: How can this be intregrated with BibServer
  • Jim: Could OKF pick up and post periodic dumps of ORCID data? And support a BibServer over those dumps?

HathiTrust Lawsuit

See Karen’s blog post on the topic:
  • Judge supports digitization for indexing as a fair use.
  • No decision on orphan works
  • Support for “just in case” digitization to serve sight impaired users
  • Support for digitization for preservation

OKFN labs for cultural activities

  • Background: Restructuring of OKF
  • Projects and tools are now pulled into OKFN labs, which will mainly focus on government and financial data:
  • Rather than “orphan” the other projects, there is now another lab in development for those, including Bibserver.
  • Example projects/code and blog posts that woul find their place at this “open culture lab”:
  • Joris, Sam and Etienne Posthumus working on this. Please propose projects to Joris and Sam and they can help.
  • Suggest: organize “code days” for bibliographic data

W3C working group on biblio extension to

Journal Article Tag Suite (JATS) Standard


  • May merge some developer lists into one, which are now scattered. openbiblio-dev could be included in this.
  • We talked for a short time about ResourceSync effort to provide standard for syncing web resources:

To Dos

  • Adrian will try to find time for a seperate post on ORCID

Minutes: 25th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

- September 5, 2012 in BibServer, event, Events, minutes, OKFN Openbiblio

Date: September, 4th 2012, 15:00 GMT Channels: Meeting was held via Skype and Etherpad


  • Peter Murray-Rust
  • Naomi Lillie
NB Karen Coyle apologies due to attendance at DublinCore conference


As there was just PeterMR and me attending this call, we abandoned any formal agenda and had a very pleasant chat discussing PeterMR’s engagements and the upcoming OKFestival. PeterMR has been presenting various Bibliographic tools (including BibSoup) at a number of events lately, including VIVO12, and will do so at the upcoming Digital Science 2012 in Oxford. We discussed support for the existing tools we have in the Open Knowledge Foundation, in terms of person-resource and funding, and the importance of BiBServer as an underlying tool for much of the work to be done in and around Open Bibliography and Access. OKFest is less than 2 weeks away now and there is so much potential here for collaboration and idea generation… We agreed we are very excited and looking forward to meeting the pillars of Open society as well as those brand-new to this world which will only grow in influence and importance. Now is the time to embrace Open! There were no particular actions, but it was helpful to consider how we can make a difference on the world of bibliography, for OKFN and GLAM institutions in general (ie galleries, libraries, archives and museums). To join the Open Bibliography community sign up here – you may also be interested in the Open Access Working Group which is closely aligned in its outlook and aims.

Importing Spanish National Library to BibServer

- August 7, 2012 in BibServer, Data, JISC OpenBib, national library, OKFN Openbiblio, wp5, wp6

The Spanish National Library (Biblioteca Nacional de España or BNE) has released their library catalogue as Linked Open Data on the Datahub. Initially this entry only containd the SPARQL endpoints and not downloads of the full datasets. After some enquiries from Naomi Lillie the entry was updated with links to the some more information and bulk downloads at: This library dataset is particularly interesting as it is not a ‘straightforward’ dump of bibliographic records. This is best explained by Karen Coyle in her blogpost. For a BibServer import,  the implications are that we have to distinguish the types of record that is read by the importing script and take the relevant action before building the BibJSON entry. Fortunately the datadump was made as N-Triples already, so we did not have to pre-process the large datafile (4.9GB) in the same manner as we did with the German National Library dataset. The Python script to perform the reading of the datafile can be viewed at A complicating matter from a data wrangler’s point of view is that the field names are based on IFLA Standards, which are numeric codes and not ‘guessable’ English terms like DublinCore fields for example. This is more correct from an international and data quality point of view, but does make the initial mapping more time consuming.

 So when mapping a data item like we need to dereference each fieldname and map it to the relevant BibJSON entry. As we identify more Linked Open Data National Bibliographies, these experiments will be continued under the BibServer instance.

Minutes: 24th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

- August 7, 2012 in JISC OpenBib, minutes, OKFN Openbiblio

Date: August, 7th 2012, 15:00 GMT Channels: Meeting was held via Skype and Etherpad


  • Jim Pitman
  • Karen Coyle
  • Naomi Lillie


JISC Open Biblio 2 project coming to close

  • Blog-post write-up of project being finished this week, Mark MacGillivray reporting back to JISC in late September
  • Further funding being explored mainly in terms of related work


  • Similar to BibJSON
  • Uses other sources, has no explicit license / restrictions
  • API will give 500 returns a day
  • Jim’s example:
    • author identity is not working very well – this example contains a book that isn’t Jim’s
  • There is no record without an ISBN – seems to be no information from pre-1970
  • Claims to have 7million books but only 2m authors – FAQs state that records are gleaned from different libraries so duplication is likely
  • Open Library is possibly a better source

Karen’s most recent blog:

  • “The argument that Google has made from the beginning of its book scanning project is that copying for the purpose of providing keyword access to full texts is fair use”
    • HathiTrust has been in court to defend the storing and searching of metadata


Community Discussions 3

- July 13, 2012 in BibServer, Data, event, Events, JISC OpenBib, jiscopenbib2, licensing, News, OKFN Openbiblio, wp3, wp4, wp5

It has been a couple of months since the round-up on Community Discussions 2 and we have been busy! BiblioHack was a highlight for me, and last week included a meeting of many OKFN types – here’s a picture taken by Lucy Chambers for @OKFN of some team members: IMG_0351 The Discussion List has been busy too:
  • Further to David Weinbergers’s pointer that Harvard released 12 million bibliographic records with a CC0 licence, Rufus Pollock created a collection on the DataHub and added it to the Biblio section for easy of reference

  • Rufus also noticed that OCLC had issued their major release of VIAF, meaning that millions of author records are now available as Open Data (under Open Data Commons Attribution license), and updated the DataHub dataset to reflect this

  • Peter Murray-Rust noted that Nature has made its metadata Open CC0

  • David Shotton promoted the International Workshop on Contributorship and Scholarly Attribution at Harvard, and prepared a handy guide for attribution of submissions

  • Adrian Pohl circulated a call for participation for the SWIB12 “Semantic Web in Bibliotheken” (Semantic Web in Libraries) Conference in Cologne, 26-28 November this year, and hosted the monthly Working Group call

  • Lars Aronsson looked at multivolume works, asking whether the OpenLibrary can create and connect records for each volume. HathiTrust and Gallica were suggested as potential tools in collating volumes, and the barcode (containing information populated by the source library) was noted as being invaluable in processing these

  • Sam Leon explained that TEXTUS would be integrating BibSever facet view and encouraged people to have a look at the work so far; Tom Oinn highlighted the collaboration between Enriched BibJSON and TEXTUS, and explained that he would be adding a ‘TEXTUS’ field to BibJSON for this purpose

  • Sam also circulated two tools for people to test, Pundit and Korbo, which have been developed out of Digitised Manuscripts to Europeana (DM2E)

  • Jenny Molloy promoted the Open Science Hackday which took place last week – see below for a snap-shot courtesy of @OKFN:

IMG_1964 In related news, Peter Murray-Rust is continuing to advocate the cause of open data – do have a read of the latest posts on his blog to see how he’s getting on. The Open Biblio community continues to be invaluable to the Open GLAM, Heritage, Access and other groups too and I would encourage those interested in such discussions to join up at the OKFN Lists page.

BiblioHack: Day 2, part 2

- June 14, 2012 in BibServer, Data, event, Events, JISC OpenBib, jiscopenbib2, minutes, News, OKFN Openbiblio, Talks, wp1, wp2, wp3, wp4, wp5, wp6, wp7, wp8, wp9

Pens down! Or, rather, key-strokes cease! BiblioHack has drawn to a close and the results of two days’ hard labour are in:

A Bibliographic Toolkit

Utilising BibServer Peter Murray-Rust reported back on what was planned, what was done, and the overlap between the two! The priority was cleaning up the process for setting up BibServers and getting them running on different architectures. (PubCrawler was going to be run on BibServer but currently it’s not working). Yesterday’s big news was that Nature has released 30 million references or thereabouts – this furthers the cause of scholarly literature whereby we, in principle, can index records rather than just corporate organisations being able / permitted to do so. National Bibliographies have been put on BibSoup – UK (‘BL’), Germany, Spain and Sweden – with the technical problem character encodings raising its head (UTF8 solves this where used). Also, BibSoup is useful for TEXTUS so the overall ‘toolkit’ approach is reinforced! Open Access Index Emanuil Tolev presented on ACat – Academic Catalogue. The first part of an index is having things to access – so gathering about 55,000 journals was a good start! Using Elastic Search within these journals will give list of contents which will then provide lists of articles (via facet view), then other services will determine licensing / open access information (URL checks assisted in this process). The ongoing plan is to use this tool to ascertain licensing information for every single record in the world. (Link to ACat to follow). Annotation Tools Tom Oinn talked about the ideas that have come out of discussions and hacking around annotators and TEXTUS. Reading lists and citation management is a key part of what TEXTUS is intended to assist with, so the plan is for any annotation to be allowed to carry a citation – whether personal opinion or related record. Personalised lists will come out of this and TEXTUS should become a reference management tool in its own right. Keep your eye on TEXTUS for the practical applications of these ideas! Note: more detailed write-ups will appear courtesy of others, do watch the OKFN blog for this and all things open… Postscript: OKFN blog post here Huge thanks to all those who participated in the event – your ideas and enthusiasm have made this so much fun to be involved with. Also thanks to those who helped run the event, visible or behind-the-scenes, particularly Sam Leon. Here’s to the next one :-)

BiblioHack: Day 2, part 1

- June 14, 2012 in BibServer, Data, event, Events, JISC OpenBib, jiscopenbib2, minutes, News, OKFN Openbiblio, Talks, wp1, wp2, wp3, wp4, wp5, wp6, wp7, wp8, wp9

After easing into the day with breakfast and coffee, each of the 3 sub-groups gave an overview of the mini-project’s aim and fed back on the evening’s progress:
  • Peter Murray-Rust revisited the overarching theme of ‘A Bibliographic Toolkit’ and the BibServer sub-group’s specific work on adding datasets and easily deploying BibServer; Adrian Pohl followed up to explain that he would be developing a National Libraries BibServer.
  • Tom Oinn explained the Annotation Tools sub-groups’s work on developing annotation tools – ie TEXTUS – looking at adding fragments of text, with your own comments and metadata linked to it, which then forms BibSoup collections. Collating personalised references is enhanced with existing search functionality, and reading lists with annotations can refer to other texts within TEXTUS.
  • Mark MacGillivray presented the 3rd group’s work on an Open Access Index. This began with listing all the journals that can be found in the whole world, with the aim of identifying the licence of each article. They have been scraping collections (eg PubMed) and gathering journals – at the time of speaking they had around 50,000+! The aim is to enable a crowd-sourced list of every journal in the world which, using PubCrawler, should provide every single article in the world.
With just 5 hours left before stopping to gather thoughts, write-up and feedback to the rest of the group, it will be very interesting to see the result…

BiblioHack: Day 1

- June 14, 2012 in BibServer, Data, event, Events, JISC OpenBib, jiscopenbib2, licensing, lod-lam, minutes, OKFN Openbiblio, Talks, wp1, wp2, wp3, wp4, wp5, wp6, wp7, wp8, wp9

The first day of BiblioHack was a day of combinations and sub-divisions! The event attendees started the day all together, both hackers and workshop / seminar attendees, and Sam introduced the purpose of the day as follows: coders – to build tools and share ideas about things that will make our shared cultural heritage and knowledge commons more accessible and useful; non-coders – to get a crash course in what openness means for galleries, libraries, archives and museums, why it’s important and how you can begin opening up your data; everyone – to get a better idea about what other people working in your domain do and engender a better understanding between librarians, academics, curators, artists and technologists, in order to foster the creation of better, cooler tools that respond to the needs of our communities. The hackers began the day with an overview of what a hackathon is for and how it can be run, as presented by Mahendra Mahey, and followed with lightning talks as follows:
  • Talk 1 Peter Murray Rust & Ross Mounce – Content and Data Mining and a PDF extractor
  • Talk 2 Mike Jones – the m-biblio project
  • Talk 4 Ian Stuart – ORI/RJB (formerly OA-RJ)
  • Talk 5 Etienne Posthumus – Making a BibServer Parser
  • Talk 6 Emanuil Tolev – IDFind – identifying identifiers (“Feedback and real user needs won’t gather themselves”)
  • Talk 7 Mark MacGillivray – BibServer – what the project has been doing recently, how that ties into the open access index idea.
  • Talk 8 Tom Oinn – TEXTUS
  • Talk 9 Simone Fonda – Pundit – collaborative semantic annotations of texts (Semantic Web-related tool)
  • Talk 10 Ian Stuart – The basics of Linked Data
We decided we wanted to work as a community, using our different skills towards one overarching goal, rather than breaking into smaller groups with separate agendas. We formed the central idea of an ‘open bibliographic tool-kit’ and people identified three main areas to hack around, playing to their skills and interests:
  • Utilising BibServer – adding datasets and using PubCrawler
  • Creating an Open Access Index
  • Developing annotation tools
At this point we all broke for lunch, and the workshoppers and hackers mingled together. As hoped, conversations sprung up between people from the two different groups and it was great to see suggestions arising from shared ideas and applications of one group being explained to the theories of the other. We re-grouped and the workshop continued until 16.00 – see here for Tim Hodson’s excellent write-up of the event and talks given – when the hackers were joined by some who attended the workshop. Each group gave a quick update on status, to try to persuade the new additions to the group to join their particular work-flow, and each group grew in number. After more hushed discussions and typing, the day finished with a talk from Tara Taubman about her background in the legalities of online security and IP, and we went for dinner. Hacking continued afterwards and we celebrated a hard day’s work down the pub, lookong forward to what was to come. Day 2 to follow…