You are browsing the archive for wp7.

BiblioHack: Day 2, part 2

- June 14, 2012 in BibServer, Data, event, Events, JISC OpenBib, jiscopenbib2, minutes, News, OKFN Openbiblio, Talks, wp1, wp2, wp3, wp4, wp5, wp6, wp7, wp8, wp9

Pens down! Or, rather, key-strokes cease! BiblioHack has drawn to a close and the results of two days’ hard labour are in:

A Bibliographic Toolkit

Utilising BibServer Peter Murray-Rust reported back on what was planned, what was done, and the overlap between the two! The priority was cleaning up the process for setting up BibServers and getting them running on different architectures. (PubCrawler was going to be run on BibServer but currently it’s not working). Yesterday’s big news was that Nature has released 30 million references or thereabouts – this furthers the cause of scholarly literature whereby we, in principle, can index records rather than just corporate organisations being able / permitted to do so. National Bibliographies have been put on BibSoup – UK (‘BL’), Germany, Spain and Sweden – with the technical problem character encodings raising its head (UTF8 solves this where used). Also, BibSoup is useful for TEXTUS so the overall ‘toolkit’ approach is reinforced! Open Access Index Emanuil Tolev presented on ACat – Academic Catalogue. The first part of an index is having things to access – so gathering about 55,000 journals was a good start! Using Elastic Search within these journals will give list of contents which will then provide lists of articles (via facet view), then other services will determine licensing / open access information (URL checks assisted in this process). The ongoing plan is to use this tool to ascertain licensing information for every single record in the world. (Link to ACat to follow). Annotation Tools Tom Oinn talked about the ideas that have come out of discussions and hacking around annotators and TEXTUS. Reading lists and citation management is a key part of what TEXTUS is intended to assist with, so the plan is for any annotation to be allowed to carry a citation – whether personal opinion or related record. Personalised lists will come out of this and TEXTUS should become a reference management tool in its own right. Keep your eye on TEXTUS for the practical applications of these ideas! Note: more detailed write-ups will appear courtesy of others, do watch the OKFN blog for this and all things open… Postscript: OKFN blog post here Huge thanks to all those who participated in the event – your ideas and enthusiasm have made this so much fun to be involved with. Also thanks to those who helped run the event, visible or behind-the-scenes, particularly Sam Leon. Here’s to the next one :-)

BiblioHack: Day 2, part 1

- June 14, 2012 in BibServer, Data, event, Events, JISC OpenBib, jiscopenbib2, minutes, News, OKFN Openbiblio, Talks, wp1, wp2, wp3, wp4, wp5, wp6, wp7, wp8, wp9

After easing into the day with breakfast and coffee, each of the 3 sub-groups gave an overview of the mini-project’s aim and fed back on the evening’s progress:
  • Peter Murray-Rust revisited the overarching theme of ‘A Bibliographic Toolkit’ and the BibServer sub-group’s specific work on adding datasets and easily deploying BibServer; Adrian Pohl followed up to explain that he would be developing a National Libraries BibServer.
  • Tom Oinn explained the Annotation Tools sub-groups’s work on developing annotation tools – ie TEXTUS – looking at adding fragments of text, with your own comments and metadata linked to it, which then forms BibSoup collections. Collating personalised references is enhanced with existing search functionality, and reading lists with annotations can refer to other texts within TEXTUS.
  • Mark MacGillivray presented the 3rd group’s work on an Open Access Index. This began with listing all the journals that can be found in the whole world, with the aim of identifying the licence of each article. They have been scraping collections (eg PubMed) and gathering journals – at the time of speaking they had around 50,000+! The aim is to enable a crowd-sourced list of every journal in the world which, using PubCrawler, should provide every single article in the world.
With just 5 hours left before stopping to gather thoughts, write-up and feedback to the rest of the group, it will be very interesting to see the result…

BiblioHack: Day 1

- June 14, 2012 in BibServer, Data, event, Events, JISC OpenBib, jiscopenbib2, licensing, lod-lam, minutes, OKFN Openbiblio, Talks, wp1, wp2, wp3, wp4, wp5, wp6, wp7, wp8, wp9

The first day of BiblioHack was a day of combinations and sub-divisions! The event attendees started the day all together, both hackers and workshop / seminar attendees, and Sam introduced the purpose of the day as follows: coders – to build tools and share ideas about things that will make our shared cultural heritage and knowledge commons more accessible and useful; non-coders – to get a crash course in what openness means for galleries, libraries, archives and museums, why it’s important and how you can begin opening up your data; everyone – to get a better idea about what other people working in your domain do and engender a better understanding between librarians, academics, curators, artists and technologists, in order to foster the creation of better, cooler tools that respond to the needs of our communities. The hackers began the day with an overview of what a hackathon is for and how it can be run, as presented by Mahendra Mahey, and followed with lightning talks as follows:
  • Talk 1 Peter Murray Rust & Ross Mounce – Content and Data Mining and a PDF extractor
  • Talk 2 Mike Jones – the m-biblio project
  • Talk 4 Ian Stuart – ORI/RJB (formerly OA-RJ)
  • Talk 5 Etienne Posthumus – Making a BibServer Parser
  • Talk 6 Emanuil Tolev – IDFind – identifying identifiers (“Feedback and real user needs won’t gather themselves”)
  • Talk 7 Mark MacGillivray – BibServer – what the project has been doing recently, how that ties into the open access index idea.
  • Talk 8 Tom Oinn – TEXTUS
  • Talk 9 Simone Fonda – Pundit – collaborative semantic annotations of texts (Semantic Web-related tool)
  • Talk 10 Ian Stuart – The basics of Linked Data
We decided we wanted to work as a community, using our different skills towards one overarching goal, rather than breaking into smaller groups with separate agendas. We formed the central idea of an ‘open bibliographic tool-kit’ and people identified three main areas to hack around, playing to their skills and interests:
  • Utilising BibServer – adding datasets and using PubCrawler
  • Creating an Open Access Index
  • Developing annotation tools
At this point we all broke for lunch, and the workshoppers and hackers mingled together. As hoped, conversations sprung up between people from the two different groups and it was great to see suggestions arising from shared ideas and applications of one group being explained to the theories of the other. We re-grouped and the workshop continued until 16.00 – see here for Tim Hodson’s excellent write-up of the event and talks given – when the hackers were joined by some who attended the workshop. Each group gave a quick update on status, to try to persuade the new additions to the group to join their particular work-flow, and each group grew in number. After more hushed discussions and typing, the day finished with a talk from Tara Taubman about her background in the legalities of online security and IP, and we went for dinner. Hacking continued afterwards and we celebrated a hard day’s work down the pub, lookong forward to what was to come. Day 2 to follow…

Recent BibServer technical development

- May 8, 2012 in BibServer, Data, JISC OpenBib, jiscopenbib2, News, OKFN Openbiblio, wp2, wp3, wp5, wp6, wp7, wp8

Along with the recent push of new front-end functionality to BibServer, and demonstrated on BibSoup, we have also applied some changes to the back-end. The new scheduled collection uploader is now runnable as a stand-alone tool, to which source URLs can be provided for retrieval, conversion, and upload. Retrieved sources are stored and available from a folder on disk, as are the conversions. Parsers can now be written in any language and plugged into the ingest functionality – for example, we now have a MARC parser that runs in perl and is usable via and available on an instance of BibServer – thanks very much to Ed for that. In addition, parsers need no longer be ‘parsers’ – we have introduced the concept of scrapers as well. Check out our new Wikipedia parser / scraper, for example; it functions by taking in a search value rather than a URL, then using that to search Wikipedia for relevant references which it downloads, bundles, and converts to a BibJSON collection – this is a really great example that Etienne put together, and it demonstrates a great deal of potential for further parser / scraper development. See the examples on the BibServer repo for more insight – they are in the parserscrapers_plugins folder, and they are managed by bibserver/ We know documents are now lacking – we have set up an online docs resource but are in the process of writing up to populate it – please check back soon. As usual, development work is scheduled via the tickets and milestones on our repo. Current efforts are on documentation and adding as many feature requests as possible before our hackathon on June 12th – 14th.

BibJSON updates

- May 8, 2012 in BibServer, Data, JISC OpenBib, jiscopenbib2, lod-lam, News, OKFN Openbiblio, wp2, wp3, wp5, wp6, wp7, wp8

Following recent discussion on our mailing list, BibJSON has been updated to adopt JSON-LD for all your linked data needs. This enables us to keep the core of BibJSON pretty simple whilst also opening up potential for more complex usage where that is required. Due to this, we no longer use the “namespace” key in BibJSON. Other changes include usage of “_” prefix on internal keys – so wherever our own database writes info into a record, we prefix it, such as “_id”. Because of this, uploaded BibJSON records can have an “id” key that will work, as well as an “_id” uuid applied by the BibServer system. For more information, check out and JSON-LD

New BibServer features available on BibSoup

- May 8, 2012 in BibServer, Data, JISC OpenBib, jiscopenbib2, News, OKFN Openbiblio, wp2, wp3, wp5, wp6, wp7, wp8

A couple of months ago the development team had a Sprint and came up with some cool ideas of how to improve the user experience for BibServer and, subsequently, BibSoup. Have a play with the new features and see below for the details:

Main pages

  • Collections visualisation – a smart new graphic on the landing page showing information from new collections

  • Improved FAQ section with links to videos (coming soon: links to our new online docs)

Creating collections

  • New Wikipedia parser – create a collection based on the references retrievable from Wikipedia for your chosen search value

  • Improved collection upload – specify collection information, then view upload tickets to see progress and errors

  • ‘Retry’ and other options on particular collection creation attempts are also now available from the tickets page

Search results

  • Filter search results by a value range as well as specific values

  • Visualise any filter as a bubble chart and select the values you want to search with

  • Add / remove available filters and rename filter display names

  • Improved layout of record info in search results, including auto-display of the first image referenced in a record – e.g. if there is a link to an image in your record, it is displayed in the search result

Managing and sharing collections

  • Collection admin available – save your current display settings as the default for your collection, allow other users to have admin rights on your own collection

  • Share any specific searches by providing the URL displayed under the ‘share’ option

  • Embed – as the whole front-end of search and collection visualisation is handled by facetview it is possible to embed your collection search in any web page you control; the share / embed option on collection pages provides the code you need to insert to enable this

  • Download as BibJSON – a nice new obvious button on each collection provides a link to download your collection as BibJSON

Viewing records

  • Improved display of individual records, including search options to discover relevant content online

  • EXPERIMENTAL record editing – this has been enabled although still in progress – you can edit the content of a record using a visual display of the keys and values in the record, although functionality for adding new keys does not yet work. However, you can also edit the JSON directly via the options, and try saving that. Be aware – this could damage your records, and of course changes the details from whatever they were in the source content.

Still in development

These ones are not yet available on BibSoup but watch this space:

  • Creating new collections on-site – search and find particular records for inclusion in new collections or addition to pre-existing collections. This is not currently possible but we are working on making this an easy process
  • Merging collections
  • Better user creation and management, plus gravatars
  • Additional functionality on record pages – linking out directly to related sources such as PubMed, Total Impact, Service Core etc
We hope you like these changes, and find them useful – do let us know what you think and keep an eye out for the upcoming improvements.

Planning for the next three months

- March 20, 2012 in BibServer, JISC OpenBib, jiscopenbib2, minutes, OKFN Openbiblio, wp10, wp2, wp3, wp4, wp5, wp6, wp7, wp8, wp9

We have developed BibJSON
We’ve improved BibServer
We’ve made BibSoup

…But what’s next? The nature of cutting-edge technology is that it is fast-paced and constantly adapting. We may think we’ve come up with a good idea, but if it turns out someone else has already had that idea and developed it – that’s great and means we incorporate it and go on to the next exciting thing. We may think that this next thing is important, but if it turns out it doesn’t quite do the helpful thing needed to make our users delighted or promote open bibliographic data – we change tack and try something else. We know what we want to do, ie make useful and smart tools for the people doing wonderful things in the public domain, but, as for what our end product looks like (if indeed there is the one product to play with) – well, that all depends on the emerging requirements, other technologies that come to light and how successful our ideas are along the way. Taking all that into account, at the Sprint last week we attempted to plan for the next three months. Our work will be more successful the more focused we are, and having an end-result in mind is useful for that. So, here’s a rough guide to how we think our project will shape up between now and June: To-Do Timeline NB the images are a little fuzzy, but do click on them to follow the links to Flickr where these are stored and appear more clearly. We have already published the CUL blog post and Mark has written about BiBServer functionality that arose from ideas at the Sprint. We’ll develop these ideas into workable and worthwhile tools or processes, and before we know it we’ll be three months down the line and thinking ‘…but what’s next?’

BibServer new functionality

- March 19, 2012 in announcement, JISC OpenBib, jiscopenbib2, progressPosts, software, wp6, wp7, wp8

During the sprint last week we made a lot of progress with the new functionality for version 0.5.0 – however, Etienne and I got so excited by some new ideas that we did not finish on time; apologies for the delay. We will be making the new version available over the course of this week, and will have it up and running on soon. Below is an overview of the new functionality you can expect to see over the course of the next week; we will write some blog posts about the various new capabilities, and this will tie in with the focus of the next sprint – doing docs, tests and issues (no new functionality).
  • editing of records and collections
  • merging collections from multiple sources
  • adding notes to records
  • much improved search UI
  • embed images in search results
  • better visualisation of collections
  • embeddable UI into other web pages via javascript
  • asynchronous parsing – you don’t have to hang on the page waiting for it to complete
  • feedback tickets from asynchronous parses
  • sharing collection admin rights with other users
  • new parser for NLM XML
  • new parser concept – search term gets pages from wikipedia, pulls citations from pages
  • capability to accept and run parsers written in different programming languages
  • browse site users

Installing BibServer from the repo on Mac OSX

- March 13, 2012 in BibServer, Guest post, JISC OpenBib, jiscopenbib2, wp2, wp3, wp5, wp6, wp7, wp8

The following guest post is by Edmund Chamberlain who works at Cambridge Unviversity Library. As part of my work on the Open Bibliography project, I wanted to test how easy it would be for an average Systems Librarian such as myself to get BibServer up and running. Turns out, it was pretty simple, for a development environment at least. The latest install docs can be found at and contain pointers to all the required packages and dependencies; see here for install instructions.

Python and dependencies

I started almost from scratch with a new Macbook Air running OSX Lion. The first thing I needed was the latest binaries for Python, the language BibServer and most OKFN projects are coded in. Python is installed on OSX by default but for good measure, I installed XCode 4 for free from the Apple App store. Advice on getting Python onto your favoured *nix OS or even Windows can be found on the main Python site. According to the BibServer docs, a few additional dependencies were required, specifically PIP (one of several Python package manager options) and Virtual Env (a means to create multiple separate Python environments). Some great instructions on doing this can be found here. Finally, I needed GIT version control software. Instructions for getting GIT onto OSX can be found along with a dedicated installer. If you are not familiar with GIT, here is a great introduction.


Next up, I needed to install the indexing service underpinning BibServer, ElasticSearch. Having spent days grappling with various indexing solutions and document / graph based databases in the past, his was the part I was most hesitant about. Turns out, it really was as simple as the instructions stated. 1) Download the latest version into an appropriate place, extract files and simply start it. 2) Start ElasticSearch with:
$sudo bin/elasticsearch 3) Elastic Search is built with and runs on top of of Java. If you don’t have this installed OSX Lion will prompt you to download and install the latest version. 4) The install instructions give some tips on setting it up as a service.


With GIT and VirtualEnv installed, BibServer can be pulled and set up relatively quickly. 1) Create and start a virtual environment where {myenv} is the filepath of the environment:
virtualenv {myenv}
. {myenv}/bin/activate 2) Using GIT, clone the BibServer source code into that environment:
mkdir {myenv}/src
sudo git clone {myenv}/src/ 3) Run a development install using Pip:
cd {myenv}/src/bibserver/bibserver
sudo pip install -e .

Running it!

1) Ensure ElasticSearch is running. 2) Start Bibserver up:
sudo python {myenv}/src/bibserver/bibserver/ 3) Point your favoured web browser at:
localhost::5000 4) Upload a sample CSV file. BibServer can be easily run a a background process using screen or some other suitable tool.

BibSoup beta: released

- February 9, 2012 in BibServer, Data, JISC OpenBib, jiscopenbib2, News, wp3, wp4, wp5, wp6, wp7

BibSoup is here! And it’s going to revolutionise how you work with bibliographic metadata. bibsoup_screenshot Peter has been blogging for a while about BibSoup (see here for the basics and here for how to use it) and we’ve mentioned it in passing on this blog (for example this sprint post and explanation of Bib- terms)… But now it is time for the ‘official’ launch. Hurrah! So, how to get involved? Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo. We already have parsers available to get your data directly via either BibTex or RIS (or from BibJSON…), which means you can get data in from most major bibliographic tools already; you can even use the parsers programmatically if you like, at (although that functionality is in the process of improvement). We are open to suggestions for further parsers, and would be happy to guide anyone through making one. (By the way, we are assuming you will have seen previous posts on this site and will therefore know what we’re talking about, but if not then please see this OKFN blog post for a fuller explanation of what BibSoup is for, why it’s great and what this overall project is all about). So, what do you think? Let us know. There will be bugs, or areas we could improve, so please pass suggestions our way. Feature requests can be submitted via our issue tracker, and we batch those up into milestones to work towards the next release. Our current focus is on improving parser functionality and also on enabling editing. We hope you like it and find all this useful… do add your collections so we can share them with the rest of the world, too. If you would like your own BibServer, go ahead and download the code, or contact us for help / support options.