You are browsing the archive for Mark MacGillivray.

A revamp of bibserver and bibsoup

- January 11, 2013 in bibjson, BibServer, bibsoup

Since our work last year on the JISC Open Bibliography 2 project, I have been thinking about the approach we took to building a tool that people might use; some of that approach, I think, was wrong. So, I have recently been working on some changes and pushed a new version of bibserver to the repository, in a branch called bibwiki. Also today, the service running at http://bibsoup.net has been rebooted to run the new branch. One of the downsides of this is that user accounts and data that existed on the old system are no longer available; because there were some issues with the old system anyway, it was giving errors for a few of the more recent attempts to upload large datasets, so I decided to wipe the slate clean and start again from scratch. However, if you had any particular collections in there that you need to have recovered, please get in touch via the openbiblio-dev mailing list and I will recover them for you. Now, on to the details of what has changed, and why. Let’s start with the why.

Why change it?

One of the original requirements of bibserver was that it would present a personally curated collection of bibliographic records; this extended not only to the curation of the collection, but to the curation of records within that collection. Unfortunately, this made every collection an island – a private island, with guards round the edges; not so good for building open knowledge or community. Also, we put too much emphasis on legacy data and formats; whilst there is of course value in old standards like bibtex, and in historical records, giving up the flexibility of the present for the sake of the past is the opposite of progress. Instead we should take the best bits of what we had and improve on them, then get our historical content into newer, more useful forms. Because of these issues, it seems sensible therefore to try a more connected, more open, more modern approach. So, what I have done is to remove the concept of “ownership” of a record and to remove the ties to legacy data formats or sources. Instead what we now have is a tool into which we can dump bibJSON data, and via which we can build personally curated collections of shared bibliographic records.

So what has changed?

you can only upload bibJSON

Whilst the conversion tools we wrote to process data from formats such as bibtex or RIS into bibJSON are useful and will be utilised elsewhere, they are not part of the core functionality of bibserver. They are a way to get from the past into the present, and once you are here, you should forget about the past and get on with the future. So your upload is one-off, and cares not from whence it came.

You can edit records, but so can anybody else

Does what it says on the tin. For now, editing is only via clunky edit of the JSON itself, but this can have a nice UI added later.

You can tag any record with anything, but so can anybody else

Anyone can tag a record with a useful term; anyone can remove a tag.

You can still build your own collection

You can still create your own collection and curate it as you see fit, and other people will not be able to change what records are in that collection; but the records themselves are still editable by anyone. Seems scary? Well, yes. But get used to it. It works for wikipedia. (Which is why I called the new branch bibwiki.)

You can’t visualise facets anymore

You used to be able to make a little bubble picture out of the facet filters down the left hand side. Now you can’t. It was a bit incongruously located, so this functionality is being hived off into a more specifically useful form.

You can search for any record and add it to your collection

Anything that is on the bibserver instance can be found by anyone using the search box, then you can add it to one of your collections. However, searching for everything has limited functionality and does not offer filters. This is because one of the constraints of scaling up to large datasets is that filtering is expensive; so now, you have simple search across everything, then nice complex filtered search on the things you care about. Best of both worlds with minimal compromise.

Simplistic record deduping

Where a record appears to have the same title-and-authors string on import as another record already in the database, it will try to squish them together. The important point here though is that the functionality exists now in to deduplicate things via various methods, and there is no longer a constraint to maintain unique copies of things, so we can get on and build those methods.

Exciting. So, what next?

Rework the parsers into a stand-alone service

The parsers from bibtex, RIS, etc should be built out as a simple service that we can run where you hit the webpage, give it your file (or file URL), and it pings you when it has done the conversion with a link for you to get your bibJSON from. This should work with parser plugins sort-of functionality, so that we can run it with the parsers we have, and other people can run it with their own if they wish. Then we can boot up a translation service at http://translate.bibsoup.net. This is the most important next step, as without it not many people will be able to upload records.

Upload some bibliographic metadata

There are numerous sources of biblio metadata we have collected over the years, and some of these will be uploaded into bibsoup for people to use. Also, there is potential to run specific instances of bibsoup for people who need them – although, overall, it is probably more sensible to keep them all together and distinguish via collections.

Bugfix

This is basically a beta 2 implementation. Please go and use the new system at http://bibsoup.net, and get back to the mailing list with the usual issues.

Build up some deduplication maybe with pybossa

Now that we can edit records and find similar ones, we can also do interesting things like enable users to tag records that are about the same thing. We can also run queries to find similar records and expose that data perhaps through a tool like pybossa, to get crowd-sourced deduplication on the go.

Rewrite the tests

All the tests that were in the original branch have yet to be copied over. A lot of them will become redundant. So if you like tests (and we should have them), then get involved with porting them over / writing new ones

Update the docs

The documentation needs to be updated, a lot of it still refers to the old branch. Although, a fair bit of it is still relevant.

Decide how to manage the code and bibsoup in the future

What I have done here are some fairly large changes to our original aims; it is possible that not everybody will like this. However, the great thing about code repositories is that we have versioning, so anyone can use any version of the software. My changes are still in a branch, so we can either merge these into the main, or fork them off to a separate project if necessary. Unless there are reasons against merging into main are given, that will be the course taken once the parsers have been hived off.

Final report: JISC Open Bibliography 2

- August 23, 2012 in BibServer, JISC OpenBib, jiscopenbib2, wp10, wp9

Following on from the success of the first JISC Open Bibliography project we have now completed a further year of development and advocacy as part of the JISC Discovery programme. Our stated aims at the beginning of the second year of development were to show our community (namely all those interested in furthering the cause of Open via bibliographic data, including: coders; academics; those with interest in supporting Galleries, Libraries, Archives and Museums; etc) what we are missing if we do not commit to Open Bibliography, and to show that Open Bibliography is a fundamental requirement of a community committed to discovery and dissemination of ideas. We intended to do this by demonstrating the value of carefully managed metadata collections of particular interest to individuals and small groups, thus realising the potential of the open access to large collections of metadata we now enjoy. We have been successful overall in achieving our aims, and we present here a summary of our output to date (it may be useful to refer to this guide to terms).

Outputs

BibServer and FacetView

The BibServer open source software package enables individuals and small groups to present their bibliographic collections easily online. BibServer utilises elasticsearch in the background to index supplied records, and these are presented via the frontend using the FacetView javascript library. This use of javascript at the front end allows easy embedding of result displays on any web page.

BibSoup and more demonstrations

Our own version of BibServer is up and running at http://bibsoup.net, where we have seen over 100 users sharing more than 14000 records across over 60 collections. Some particularly interesting example collections include: Additionally, we have created some niche instances of BibServer for solving specific problems – for example, check out http://malaria.bibsoup.net; here we have used BibServer to analyse and display collections specific to malaria researchers, as a demonstration of the extent of open access materials in the field. Further analysis allowed us to show where best to look for relevant materials that could be expected to be openly available, and to begin work on the concept of an Open Access Index for research. Another example is the German National Bibliography, as provided by the German National Library, which is in progress (as explained by Adrian Pohl and Etienne Posthumus here). We have and are building similar collections for all other national bibliographies that we receive.

BibJSON

At http://bibjson.org we have produced a simple convention for presenting bibliographic records in JSON. This has seen good uptake so far, with additional use in the JISC TEXTUS project and in Total Impact, amongst others.

Pubcrawler

Pubcrawler collects bibliographic metadata, via parsers created for particular sites, and we have used it to create collections of articles. The full post provides more information.

datahub collections

We have continued to collect useful bibliographic collections throughout the year, and these along with all others discovered by the community can be found on the datahub in the bibliographic group.

Open Access / Bibliography advocacy videos and presentations

As part of a Sprint in January we recorded videos of the work we were doing and the roles we play in this project and wider biblio promotion; we also made a how-to for using BibServer, including feedback from a new user: Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo. Peter and Tom Murray-Rust’s video, made into a prezi, has proven useful in explaining the basics of the need for Open Bibliography and Open Access:

Community activities

The Open Biblio community have gathered for a number of different reasons over the duration of this project: the project team met in Cambridge and Edinburgh to plan work in Sprints; Edinburgh also played host to a couple of Meet-ups for the wider open community, as did London; and London hosted BiblioHack – a hackathon / workshop for established enthusasiasts as well as new faces, both with and without technical know-how. These events – particularly BiblioHack – attracted people from all over the UK and Europe, and we were pleased that the work we are doing is gaining attention from similar projects world-wide.

Further collaborations

Lessons

Over the course of this project we have learnt that open source development provides great flexibility and power to do what we need to do, and open access in general frees us from many difficult constraints. There is now a lot of useful information available online for how to do open source and open access. Whilst licensing remains an issue, it becomes clear that making everything publicly and freely available to the fullest extent possible is the simplest solution, causing no further complications down the line. See the open definition as well as our principles for more information. We discovered during the BibJSON spec development that it must be clear whether a specification is centrally controlled, or more of a communal agreement on use. There are advantages and disadvantages to each method, however they are not compatible – although one may become the other. We took the communal agreement approach, as we found that in the early stages there was more value in exposing the spec to people as widely and openly as possible than in maintaining close control. Moving to a close control format requires specific and ongoing commitment. Community building remains tricky and somewhat serendipitous. Just as word-of-mouth can enhance reputation, failure of certain communities can detrimentally impact other parts of the project. Again, the best solution is to ensure everything is as open as possible from the outset, thereby reducing the impact of any one particular failure.

Opportunities and Possibilities

Over the two years, the concept of open bibliography has gone from requiring justification to being an expectation; the value of making this metadata openly available to the public is now obvious, and getting such access is no longer so difficult; where access is not yet available, many groups are now moving toward making it available. And of course, there are now plenty tools to make good use of available metadata. Future opportunities now lie in the more general field of Open Scholarship, where a default of Open Bibliography can be leveraged to great effect. For example, recent Open Access mandates by many UK funding councils (eg Finch Report) could be backed up by investigative checks on the accessibility of research outputs, supporting provision of an open access corpus of scholarly material. We intend now to continue work in this wider context, and we will soon publicise our more specific ideas; we would appreciate contact with other groups interested in working further in this area.

Further information

For the original project overview, see http://openbiblio.net/p/jiscopenbib2; also, a full chronological listing of all our project posts is available at http://openbiblio.net/tag/jiscopenbib2/. The work package descriptions are available at http://openbiblio.net/p/jiscopenbib2/work-packages/, and links to posts relevant to each work package over the course of the project follow:
  • WP1 Participation with Discovery programme
  • WP2 Collaborate with partners to develop social and technical interoperability
  • WP3 Open Bibliography advocacy
  • WP4 Community support
  • WP5 Data acquisition
  • WP6 Software development
  • WP7 Beta deployment
  • WP8 Disruptive innovation
  • WP9 Project management (NB all posts about the project are relevant to this WP)
  • WP10 Preparation for service delivery
All software developed during this project is available on open source licence. All the data that was released during this project fell under OKD compliant licenses such as PDDL or CC0, depending on that chosen by the publisher. The content of our site is licensed under a Creative Commons Attribution 3.0 License (all jurisdictions). The project team would like to thank supporting staff at the Open Knowledge Foundation and Cambridge University Library, the OKF Open Bibliography working group and Open Access working group, Neil Wilson and the team at the British Library, and Andy McGregor and the rest of the team at JISC.

Using wikipedia to build a philosophy (or other sort of) collection in BibSoup

- June 27, 2012 in BibServer, JISC OpenBib, jiscopenbib2, wp2, wp8

Here is a quick example of how to quickly build a reference collection in BibSoup, using the great source of knowledge that is Wikipedia. To begin with, you might want to go to Wikipedia directly and try performing some searches for relevant material, to help you put together sensible search terms for your area of interest. Your search terms will be used to pull relevant citations from the wikipedia database. Then, go over to the BibSoup upload page; signup / login is required, so do that if you have not already done so. Type in your wikipedia search terms in the upload box at the top of the page, give your collection a name and a description, specify the license if you wish, and choose the “wikipedia search to citations” file format from the list at the bottom. Then hit upload. A ticket will be created for building your collection, and you can view the progress on the ticket page. Once it is done, you can find your new collection either on the BibSoup collections page or on your own BibSoup user account page – for example atfor the user named “test”. Also of course, you could go straight to the URL of your collection – they appear at http://bibsoup.net/username/collection. There you go! You should now have a reference collection based on your wikipedia search terms. Check out our our example.

Pubcrawler: finding research publications

- June 13, 2012 in BibServer, JISC OpenBib, jiscopenbib2, wp2, wp5, wp6, wp8

This is a guest post from Sam Adams. (We have been using Pubcrawler in the Open Biblio 2 project to create reference collections of journal articles, and hope to continue this work further; this is a brief introduction to the software. Code is currently available in http://bitbucket.org/sea36/pubcrawler) Pubcrawler collects bibliographic metadata (author, title, reference, DOI) by indexing journals’ websites in a similar manner to the way in which search engines explore the web to build their indexes. Where possible (which depends on the particular publication) it identifies any supplementary resources associated with a paper, and whether the paper is open access (i.e. readable without a subscription or any other charge) – though it cannot determine the license / conditions of such access. Pubcrawler was originally developed by Nick Day as part of the CrystalEye project to aggregate published crystallographic structures from the supplementary data to articles on journals’ websites. Since then Pubcrawler has been extended to collect bibliographic metadata and support a wider range of journals than just those containing crystallography. Some of the activities Pubcrawler can currently support are:
  • Providing core bibliographic metadata
  • Identifying collections of open access articles
  • Identifying freely accessible supplementary information, which is often a rich source of scientific data
When pointed at a publisher’s homepage Pubcrawler will generate a list of the journals on the site and then crawl the issues’ tables of contents, recording the bibliographic metadata for the articles that it discovers. Pubcrawler uses a combination of two approaches to crawling a journal: starting at the current issue it can follow links to previous issues, walking the journal’s publication history, and if a journal’s website contains a list of issues it will also use that as a source of pages to crawl. When necessary, such as to identify supplementary resources, Pubcrawler can follow links to individual articles’ splash pages. Pubcrawler does not index any content that is restricted by a journal’s paywall – it has been designed not to follow such links, and as added protection it is run over a commercial broadband connection, rather than from inside a University network to ensure that it does not receive any kind of privileged access. While Pubcrawler’s general workflow is the same for any publication, custom parsers are required to extract the metadata and correct links from each website. Generally publishers use common templates for their journals web pages, so a parser only needs to be developed once per publishers, however in some instances, such as where older issues have not been updated to match the current template, a parser may need to support a variety of styles. Pubcrawler currently has parsers (in varying states of completeness) for a number of publishers (biased by its history of indexing published Crystallographic structures):
  • The American Chemical Society (ACS)
  • Elsevier
  • The International Union of Crystallography (IUCr)
  • Nature
  • The Royal Society of Chemistry (RSC)
  • Springer
  • Wiley
And to date it has indexed over 10 million bibliographic records. There are many other publishers who could be supported by Pubcrawler, they just require parsers to be created for them. Pubcrawler requires two types of maintainance – the general support to keep it running, administer servers etc, that any software requires, and occasional updates to the parsers as journal’s websites change their formatting.

Open source development – how we are doing

- May 29, 2012 in BibServer, JISC OpenBib, jiscopenbib2, licensing, progress, progressPosts, projectMethodology, projectPlan, riskAnalysis, software, WIN, wp10, wp2, wp3, wp6, wp9

Whilst at Open Source Junction earlier this year, I talked to Sander van der Waal and Rowan Wilson about the problems of doing open source development. Sander and Rowan work at OSS watch, and their aim is to make sure that open source software development delivers its potential to UK HEI and research; so, I thought it would be good to get their feedback on how our project is doing, and if there is anything we are getting wrong or could improve on. It struck me that as other JISC projects such as ours are required to make their output similarly publicly available, this discussion may be of benefit to others; after all, not everyone knows what open source software is, let alone the complexities that can arise from trying to create such software. Whilst we cannot help avoid all such complexities, we can at least detail what we have found helpful to date, and how OSS Watch view our efforts. I provided Sander and Rowan a review of our project, and Rowan provided some feedback confirming that overall we are doing a good job, although we lack a listing of the other open source software our project relies on, and their licenses. Whilst such data can be discerned from the dependencies of the project, this is not clear enough; I will add a written list of dependencies to the README. The response we received is provided below, followed by the overview I initially provided, which gives a brief overview of how we managed our open source development efforts: ==== Rowan Wilson, OSS Watch, responds: Your work on this project is extremely impressive. You have the systems in place that we recommend for open development and creation of community around software, and you are using them. As an outsider I am able to quickly see that your project is active and the mailing list and roadmap present information about ways in which I could participate. One thing I could not find, although this may be my fault, is a list of third party software within the distribution. This may well be because there is none, but it’s something I would generally be keen to see for the purposes of auditing licence compatibility. Overall though I commend you on how tangible and visible the development work on this project is, and on the focus on user-base expansion that is evident on the mailing list. ==== Mark MacGillivray wrote: Background – May 2011, OKF / AIM bibserver project Open Knowledge Foundation contracted with American Institute of Mathematics under the direction of Jim Pitman in the dept. of Maths and Stats at UC Berkeley. The purpose of the project was to create an open source software repository named BibServer, and to develop a software tool that could be deployed by anyone requiring an easy way to put and share bibliographic records online. A repository was created at http://github.com/okfn/bibserver, and it performs the usual logging of commits and other activities expected of a modern DVCS system. This work was completed in September 2011, and the repository has been available since the start of that project with a GNU Affero GPL v3 licence attached. October 2011 – JISC Open Biblio 2 project The JISC Open BIblio 2 project chose to build on the open source software tool named BibServer. As there was no support from AIM for maintaining the BibServer repository, the project took on maintenance of the repository and all further development work, with no change to previous licence conditions. We made this choice as we perceive open source licensing as a benefit rather than a threat; it fit very well with the requirements of JISC and with the desires of the developers involved in the project. At worst, an owner may change the licence attached to some software, but even in such a situation we could continue our work by forking from the last available open source version (presuming that licence conditions cannot be altered retrospectively). The code continues to display the licence under which it is available, and remains publicly downloadable at http://github.com/okfn/bibserver. Should this hosting resource become publicly unavailable, an alternative public host would be sought. Development work and discussion has been managed publicly, via a combination of the project website at http://openbiblio.net/p/jiscopenbib2, the issue tracker at http://github.com/okfn/bibserver/issues, a project wiki at http://wiki.okfn.org/Projects/openbibliography, and via a mailing list at openbiblio-dev@lists.okfn.org February 2012 – JISC Open Biblio 2 offers bibsoup.net beta service In February the JISC Open Biblio 2 project announced a beta service available online for free public use at http://bibsoup.net. The website runs an instance of BibServer, and highlights that the code is open source and available (linking to the repository) to anyone who wishes to use it. Current status We believe that we have made sensible decisions in choosing open source software for our project, and have made all efforts to promote the fact that the code is freely and publicly available. We have found the open source development paradigm to be highly beneficial – it has enabled us to publicly share all the work we have done on the project, increasing engagement with potential users and also with collaborators; we have also been able to take advantage of other open source software during the project, incorporating it into our work to enable faster development and improved outcomes. We continue to develop code for the benefit of people wishing to publicly put and share their bibliographies online, and all our outputs will continue to be publicly available beyond the end of the current project.

Recent BibServer technical development

- May 8, 2012 in BibServer, Data, JISC OpenBib, jiscopenbib2, News, OKFN Openbiblio, wp2, wp3, wp5, wp6, wp7, wp8

Along with the recent push of new front-end functionality to BibServer, and demonstrated on BibSoup, we have also applied some changes to the back-end. The new scheduled collection uploader is now runnable as a stand-alone tool, to which source URLs can be provided for retrieval, conversion, and upload. Retrieved sources are stored and available from a folder on disk, as are the conversions. Parsers can now be written in any language and plugged into the ingest functionality – for example, we now have a MARC parser that runs in perl and is usable via ingest.py and available on an instance of BibServer – thanks very much to Ed for that. In addition, parsers need no longer be ‘parsers’ – we have introduced the concept of scrapers as well. Check out our new Wikipedia parser / scraper, for example; it functions by taking in a search value rather than a URL, then using that to search Wikipedia for relevant references which it downloads, bundles, and converts to a BibJSON collection – this is a really great example that Etienne put together, and it demonstrates a great deal of potential for further parser / scraper development. See the examples on the BibServer repo for more insight – they are in the parserscrapers_plugins folder, and they are managed by bibserver/ingest.py. We know documents are now lacking – we have set up an online docs resource but are in the process of writing up to populate it – please check back soon. As usual, development work is scheduled via the tickets and milestones on our repo. Current efforts are on documentation and adding as many feature requests as possible before our hackathon on June 12th – 14th.

BibJSON updates

- May 8, 2012 in BibServer, Data, JISC OpenBib, jiscopenbib2, lod-lam, News, OKFN Openbiblio, wp2, wp3, wp5, wp6, wp7, wp8

Following recent discussion on our mailing list, BibJSON has been updated to adopt JSON-LD for all your linked data needs. This enables us to keep the core of BibJSON pretty simple whilst also opening up potential for more complex usage where that is required. Due to this, we no longer use the “namespace” key in BibJSON. Other changes include usage of “_” prefix on internal keys – so wherever our own database writes info into a record, we prefix it, such as “_id”. Because of this, uploaded BibJSON records can have an “id” key that will work, as well as an “_id” uuid applied by the BibServer system. For more information, check out BibJSON.org and JSON-LD

New BibServer features available on BibSoup

- May 8, 2012 in BibServer, Data, JISC OpenBib, jiscopenbib2, News, OKFN Openbiblio, wp2, wp3, wp5, wp6, wp7, wp8

A couple of months ago the development team had a Sprint and came up with some cool ideas of how to improve the user experience for BibServer and, subsequently, BibSoup. Have a play with the new features and see below for the details:

Main pages

  • Collections visualisation – a smart new graphic on the landing page showing information from new collections

  • Improved FAQ section with links to videos (coming soon: links to our new online docs)

Creating collections

  • New Wikipedia parser – create a collection based on the references retrievable from Wikipedia for your chosen search value

  • Improved collection upload – specify collection information, then view upload tickets to see progress and errors

  • ‘Retry’ and other options on particular collection creation attempts are also now available from the tickets page

Search results

  • Filter search results by a value range as well as specific values

  • Visualise any filter as a bubble chart and select the values you want to search with

  • Add / remove available filters and rename filter display names

  • Improved layout of record info in search results, including auto-display of the first image referenced in a record – e.g. if there is a link to an image in your record, it is displayed in the search result

Managing and sharing collections

  • Collection admin available – save your current display settings as the default for your collection, allow other users to have admin rights on your own collection

  • Share any specific searches by providing the URL displayed under the ‘share’ option

  • Embed – as the whole front-end of search and collection visualisation is handled by facetview it is possible to embed your collection search in any web page you control; the share / embed option on collection pages provides the code you need to insert to enable this

  • Download as BibJSON – a nice new obvious button on each collection provides a link to download your collection as BibJSON

Viewing records

  • Improved display of individual records, including search options to discover relevant content online

  • EXPERIMENTAL record editing – this has been enabled although still in progress – you can edit the content of a record using a visual display of the keys and values in the record, although functionality for adding new keys does not yet work. However, you can also edit the JSON directly via the options, and try saving that. Be aware – this could damage your records, and of course changes the details from whatever they were in the source content.

Still in development

These ones are not yet available on BibSoup but watch this space:

  • Creating new collections on-site – search and find particular records for inclusion in new collections or addition to pre-existing collections. This is not currently possible but we are working on making this an easy process
  • Merging collections
  • Better user creation and management, plus gravatars
  • Additional functionality on record pages – linking out directly to related sources such as PubMed, Total Impact, Service Core etc
We hope you like these changes, and find them useful – do let us know what you think and keep an eye out for the upcoming improvements.

BibServer new functionality

- March 19, 2012 in announcement, JISC OpenBib, jiscopenbib2, progressPosts, software, wp6, wp7, wp8

During the sprint last week we made a lot of progress with the new functionality for version 0.5.0 – however, Etienne and I got so excited by some new ideas that we did not finish on time; apologies for the delay. We will be making the new version available over the course of this week, and will have it up and running on http://bibsoup.net soon. Below is an overview of the new functionality you can expect to see over the course of the next week; we will write some blog posts about the various new capabilities, and this will tie in with the focus of the next sprint – doing docs, tests and issues (no new functionality).
  • editing of records and collections
  • merging collections from multiple sources
  • adding notes to records
  • much improved search UI
  • embed images in search results
  • better visualisation of collections
  • embeddable UI into other web pages via javascript
  • asynchronous parsing – you don’t have to hang on the page waiting for it to complete
  • feedback tickets from asynchronous parses
  • sharing collection admin rights with other users
  • new parser for NLM XML
  • new parser concept – search term gets pages from wikipedia, pulls citations from pages
  • capability to accept and run parsers written in different programming languages
  • browse site users

JSON-LD / BibJSON

- February 21, 2012 in BibServer, communityBenefits, jisc, JISC OpenBib, jiscopenbib2, OKFN Openbiblio, openbiblio, progress, projectPlan, wp2, wp9

There have been requests on our mailing list recently to consider the various options for supporting validation of BibJSON and for supporting namespacing. These two options require some further consideration.

Validation

Efforts so far around BibJSON have focussed on building a useful JSON representation of bibliographic metadata, with some typical key/value pairs that are common in or extended from bibtex. This started off simply, but we have seen increasing complexity to accommodate further functionality requests. There was some work on a JSON schema for validation against, but given the aim of being as flexible as possible, and with very few required keys, the function of validation of a BibJSON document would have very little effect. Validating a document as properly formatted JSON is, of course, a good idea; but there are plenty ways to do this already – just try to parse it with any number of libraries for your programming language of choice. But to reach the stage of actually supporting validation against a pre-defined schema, we must pre-define a schema – and that means becoming inflexible (or doing such little validation as for it to be essentially pointless). An alternative to validation against a schema would be adoption of namespaces.

Namespaces

We do already have a namespace concept in BibJSON – it is just a key in the metadata, under which can be listed namespaces and a suitable prefix for them. However, this model is not widely known (because we made it up). To overcome this, we should adopt the JSON-LD method of using @context parameters. This way, it would be possible to specify the namespace in which your record keys are defined, and to share namespace information with other people / machines.

What is the point

Using namespaces, having schema, only become sensible when there is a concerted effort to share data with others. For internal use, they could be valuable for consistency, but the code we write internally adheres by definition to our own level of consistency anyway. Therefore, it is not a function of BibJSON to perform validation – BibJSON is just JSON. Rather, it is the function of a community to make agreements and to conform to those agreements as required. Where such a function must be supported, it should be done via mechanisms already available and maintained for that purpose – there is no point attempting to maintain our own; it is not our key strength or goal.

Recommendation

Change the BibJSON use of namespaces to conform to the method specified in JSON-LD, and that wherever consistency is required, agreement to share data via JSON and within a particular @context should be reached. The fundamental basic keys in BibJSON – the default context – should remain as they are, and should not require contextualisation. If contextualisation of the fundamental keys of BibJSON is required, then those keys should be contextualised into a schema by whomsoever has such a requirement.

Ramifications

  • drop the “namespace” key in BibJSON
  • continue using BibJSON as normal, but:
  • reference JSON-LD for use of @context and other more complex LD functions as required
  • wherever validation is required, perform it based on the use of namespaced keys (beyond scope of bibjson)

References