You are browsing the archive for CultureLabs.

Featured tool: Pundit – open annotation

- February 5, 2014 in CultureLabs, Featured, linked-open-data, tools

Pundit, one of the tools featured in our OpenGLAM Culture Labs is a useful tool for creating annotations in Linked Open Data. With Pundit, researchers can add annotations in a digital text and link them to related texts or other resources on the net such as DBPedia, Freebase and Geonames. This is done via a […]

DIY Open Book Scanning

- February 15, 2013 in CultureLabs, Featured

PictaPoesis_2RFor the last three weeks, I’ve been leading the beginnings of a book-scanning group here in Cambridge. It all started with a cycle ride through the snow to a glazier’s on the outskirts of the city, where I picked up a few sheets of glass before heading back to our meeting at the English Faculty Library, carefully avoiding every piece of icy ground as I went, as you really do not want to fall off a bike when you have glass strapped to your back. Our method, at least so far, has been very simple. It was inspired by the great DIY Book Scanner website, and needs only a desk lamp, a digital camera, a tripod and some book supports. As we were meeting in a library, the staff kindly lent us some foam book supports and enough extension leads to plug in the lamp. We then propped up a rare copy of a burlesque Hamlet from 1801 in our makeshift cradle, and began, laying the glass sheet on and then photographing each of the right-hand pages, before doing the same with the left. We then put our collection of 80 or so jpegs, suitably renamed and ordered into ScanTailor, which polished our efforts into something fairly respectable. All that was left was to OCR the images, stitch them all into a single PDF and upload to the Internet Archive. 000015 Here, though, we began to hit problems, and any suggestions for a solution would be very welcome indeed:
  • Our images were far from perfect, often distorted due to the slight curvature of the page or the misalignment of the camera on its tripod.
    • Current solution: short of building a rig, we are trying taking photos from above the book, which at least makes it easier to be parallel.
  • Our file sizes were enormous, and this made conversion really time-consuming
    • Current solution: use the university’s copy of Adobe Acrobat to compress the images into B&W PDFs, although it pains me that there seems to be no open-source alternative. Does anyone know of one?.
  • Big file sizes and slightly skewed images do not a good OCR make: we couldn’t get tesseract to run on windows, so resorted to using a web-based version (), with all its limitations.
    • Current solution: again, Adobe to the rescue; but are there any open-source projects out there for this?
And with that list of problems and solutions, you now have a fairly good idea of where we are. If you’re in the area of Cambridge, do get in touch, as we’re always eager for new volunteers. If you want to start your own open DIY book scanning project get in touch on the OpenGLAM mailing list. Future plans include: surveying the English Faculty Library for other books that are out of copyright and not yet digitised (not as numerous as you might think), proposing a collaboration with the Engineering Department for help constructing a standalone book scanner, and investigating what there is to be scanned in the College libraries of the city.

Bardomatic: Using Open Shakespeare to Create Games

- February 7, 2013 in CultureLabs, Featured

At the Open Knowledge Foundation we believe there is a great deal of unrealised potential in the amount of openly licensed digitised cultural heritage material available on the web for creating educational resources. It has been great to see the Open Humanities Working Group addressing this challenge. Over the last three weeks they have been building a fantastic online app called Bardomatic. Based on the CrowdCrafting platform, Bardomatic tests your knowledge of Shakespeare’s plays using openly licensed content derived from Open Shakespeare. You are given a short quote from one of the Bard’s famous works and asked to identify the play it comes from: Bardomatic_ChoosingPlay You are then told if your answer is correct: Bardomatic_GotItRight It was hacked together by volunteers from our Open Humanities Working group on their weekly Google+ Hangout and shows the kind of creativity that can be unleashed once cultural content is released under an open license (not to mention the fun that can be had in the making!). The Open Humanities Working group will now be developing a scoreboard for the game and will continue to add in sections of text for consumption by the app. If you’d like to join the hangout, sign-up to the Open Humanities mailing list where the Hangout links are circulated every Tuesday at 5pm GMT. PLAY BARDOMATIC

Using Crowdcrafting for transcribing cultural works

- November 28, 2012 in CultureLabs, Featured, tools

Crowdcrafting is a free, open-source crowd-sourcing and micro-tasking platform. It is a joint effort between the Open Knowledge Foundation and Citizen Cyberscience Centre, It enables people to create and run projects that utilise online assistance in performing tasks that require human cognition such as image classification, transcription, geocoding and more. Crowdcrafting is there to help researchers, civic hackers and developers to create projects where anyone around the world with some time, interest and an internet connection can contribute.
Crowdcrafting is different to existing efforts:
  • It’s a 100% open-source – everybody can use it or fork the code for their own purposes.
  • Unlike, say, “mechanical turk” style projects, Crowdcrafting is not designed to handle payment or money — it is designed to support volunteer-driven projects.
  • It’s designed as a platform and framework for developing deploying crowd-sourcing and microtasking apps rather than being a crowd-sourcing application itself. Individual crowd-sourcing apps are written as simple snippets of Javascript and HTML which are then deployed on a PyBossa instance (such as This way one can easily develop custom apps while using the Crowdcrafting platform to store your data, manage users, and handle workflow.
The team has been very busy in the last couple of weeks and has created a few basic templates that help new users and developers to create their own applications for the platform. 
One of them is PDF Transcribe: a PDF transcription template that could be used for transcribing full PDF documents one page at a time including scanned images. This application uses the Mozilla PDF.JS library to load an external PDF file and render it directly in the web browser without using any third party plugin.
By using PDF.JS, there is the possibility of rendering almost any PDF that is hosted under an HTTP server and then use a customized form to get the desired data extracted from it .
In this simple demo application, a PDF file is loaded in one side of the page, and in the other one a form where the volunteer will be able to transcribe the PDF page by typing the text in the input form. While this example is really simple, adapting the template to extract specific bits of information from the PDF will be really easy. The idea is that it is possible for example to extract specific items from the documents, like captions, tabular data, authorship, institutions, etc.

The application can be used for transcribing a variety of digital objects such as manuscripts, archives, postcards, shopping lists and so on. This brings great possibilities for big and smaller GLAM institutions to work with the community to get more of their digitised material transcribed. The creators of the tool are very willing to help you to develop your application, so if you have questions, do get in touch!

You can read more about the architecture in the PyBossa Documentation and follow the step-by-step tutorial to create your own apps.. More open source tools for working with cultural data, see our CultureLabs page