You are browsing the archive for James Harriman-Smith.

Meeting: 2014-09-01

- September 6, 2014 in minutes


  • James Harriman-Smith
  • John Levin
  • Seth Woodward
  • Eric Hellman


  • Project GITenberg


Reading Material


  • Seth: Give overview of where we are (see reading material above)
  • James: PhD student working on ‘reception’ studies, interested in GITenberg as text transmission
  • John: PhD student, likes very large catalogs of text, lit/digital-humanities, wants large groups of text (BALZAC)
  • Eric Hellman: behind – license conversion -> collect money -> release openly: when books are free, the book distribution system breaks; even libraries don’t want free books, don’t know how to deal with them or handle them; interested in out-of-copyright books; have been making the store/resource/service for free books; became familiar with PG’s lack; created GitenHub, found GITenberg, joined forces. Believes that the posibilities of version controlled archive, should be the way ebooks should go.


  • James: GIRenberg will both allow generation of books (once tools are in place), and allow version control, altering sources of publication. Two sides, but what about peculiarities of text, you need vc to track those things, yet what happens when there is a bad ‘correction’?
    • Seth: Do we want to follow Gutenberg’s policies? They have concept of ‘editionless’ edition, roughly based on a specific printing, but not a precise reproduction… GITenberg might be better seen as creating our own editions, one that could be printed/distributed as a ‘reasonable’ copy
    • Seth: NB: In order to produce ebooks (and even print) it has to go through an intermediate html / CSS step
    • Eric: not just version control, but the forking and merging process too: we can keep multiple branches of the same work, efficiently pull in changes from one branch to another; no reason we couldn’t maintain a branch which is faithful to a given copt from 1915 along with a book that doesn’t have the mistake of ‘with with’

  • James: but what about authority of a fork?
    • Seth: Authority on GitHub is distinguishable, number of followers

  • John: how much of this is transferrable? GITenberg is great, how does it go further? e.g. with the big bunch of 18thc texts released from ECCO database by Gale-Cengage (, and
    • Seth: yes: very extendable, as much as possible

  • Seth: a big issue is the applicability to other texts: yes. has become a clearing-house for scanned books, but it has no proof-reading process, and would be a great place to extend GITenberg in the future.

  • Seth: another big question: which formats?

  1. Markdown: NEG: not standardized, no pagination, no footnotes, and can only be improved by making our own bespoke version, which would have to be maintained. POS: Penflip, etc. are using it, so might have scope…
  2. Restructured text: has a spec, is extensible, Distributed Proofreaders use it, but has only one complete implementation in python, urls are clunky, it seems to being sidelined…
  3. TEI: lack of software, doesn’t seem to be a lot of to and fro, big ask
  • Responses:
    • Eric: TEI is probably overkill for everything; you need just enough markup to produce deliverables, like ootnotes, ToC
    • James: further useful questions here: how much support does each format have? Can we test formats?

  • Seth: Another big question: annotations
    • Seth has done work on annotator, cares for textual annotation a great deal. Annotation is the obvious win for a digital book, since such works have infinite margins and can take a variety of forms
    • Demo for some of the more exciting things that can be done: Such projects imply a potential later connection with librivox.
    • This is more of a question for the developers, although the utility of line numbers is already clear

Other Observations

  • Seth: ePub v.3 is very cool, but annotation on ePub readers looks to be very tricky…
  • John: re: Internet Archive OCR: possibility of machine correction of OCR’d text for IA books, but they seem nonplussed. This is the kind of thing that Gitenberg could help with


  • Seth: Publish actionable tasks for more developers
  • Seth: Shift Gitenberg to GitLab
  • Everyone: continue to publicise this project

Meeting: 2014-05-08

- May 9, 2014 in minutes


  • James Harriman-Smith
  • John Levin
  • Lieke Ploeger
  • Iain Emsley
  • Tod Robbins (text only)


Big things

  • Panton Principles for Humanities
  • Progress on Textus
  • OpenGLAM-dev
  • Open Humanities Award


  • Tasks from last week
  • Next meeting
  • Tasks

Panton Principles for Humanities

  • See ‘#Oxford contact’ below

Progress on Textus

  • Annotations now being pulled through to front end!Texts are being marked up!
  • Github up to date:
  • Next task: track down the source of current layout issuses: suspect a javascript conflict causing current problems
  • This taks could be offered to volunteers on OK Labs

Oxford contacts/talk

  • Iain met with OERC director (David de Roure), discussed First Folio release as CC-BY 3, and brought up idea of principles for definition of open data in humanities. The principles will now hopefully be discussed in meetings around Oxford (OERC, Bodleian).
  • OKFest mentioned to de Roure, but slightly tangential to Iain’s role in OERC
  • Oxford already has quite a community for Open Knowledge in Oxford, stuff coming forward on the science side soon
  • Currently same problems in Oxford as elsewhere: not enough communication between different levels of hierarchy. Note, however, that Oxford is looking for a digital humanities champion: perhaps an internal candiate only, but this a process we should watch

OpenGLAM-dev? / Identity of this call

  • This is connected to the perennial musing about whether there are any other humanities projects lurking
  • If we move to a more dev-orientated list we must be careful that focus on projects doesn’t alienate parts of the audience.
  • A solution to this could be to cycle around between projects, and let the call work as an arena for both showcasing and feedback for specific problems
  • There was some concern that such reorientation of the call would distinguish us from GLAM but then overlap with Labs. This could be overcome by making the session accessible to a wide(r) audience.
  • Lieke told us that this wouldn’t overlap with OpenGLAM, which tends to be about policy, events and updates from specific countries
  • So perhaps we should try and organise a big quarterly call where people come and show their ideas and hack ‘together’
  • In the short term, next month’s call can test out this format with ‘open gravestones’ discussion

Open Humanities Awards

  • Info:
  • Wondering if any one on list is going to apply: currently no-one as of yet
  • Award has been sent to OKF channels, Europeana, OpenGLAM, OpenHum, DigHum departments and lists
  • Any further help with promotion is welcome!

Open Gravestones

  • Originally discussed by James and others
  • Tod would love feedback and direction (very conceptual at this stage)
    • Open Gravestone Research doc: here
  • A first step might be extracting cemetery info from OSM and then encoding it with microdata ( and then work out a data model for burial data
    • macro = mapping/plotting the cemeteries on the glove; micro = pinpointing specific remains (graves, urns, memorials, etc.)
  • We should organise a one-off OpenGLAM / Open Humanities chat on this project, with a bit of homework to do beforehand
    • Homework could include: look for open source projects which could contribute to this: TimeMapper for example, also PyBossa / Crowdcrafting
    • Some overlap already noted with Open Plaques, and Find A Grave

Update on tasks from last week

  • IE: To write a newsletter piece on Textus for OKF blog – todo
  • IE: Write documentation – todo
  • IE: Create an OKFest proposal – general piece? – done
  • IE: Oxford contacts? – done
  • IE: Organise next meeting – done
  • JHS: Email session proposals to the list and ask for champions – Iain for Panton Principles one (ref. gmail account) – done
  • JHS: Email request to list for datasets / tools for the Open Humanities Event – done
  • JHS: Post minutes to OH blog – done
  • ALL: Send availabilities for Open Humanities event to Lieke – Iain can’t do Saturday 27th September

Next meeting

  • Focus on OpenGravestones
  • Currently set for Tuesday 10th June, 6pm UK time, 11am Utah time for Tod


  • IE: push current Textus update out to Labs to see if anyone can help with conflicts
  • IE / Jenny Molloy: post data from Oxford Nobel Prize hack
  • JL: Send Open humanities Award to Humanist list Done!
  • TR: Send out a call on about Open Gravestones on OpenGLAM, OpenHUM, OKLabs, perhaps with starter questions
  • JHS: Put these notes online
  • IE: Publicise next call with an overview of how it will be structured. Push to Labs, Discuss as well.

Meeting: 2014-04-17

- April 20, 2014 in minutes


  • James Harriman-Smith
  • Iain Emsley
  • Rufus Pollock


  • Tasks from last week
  • Update on Textus
  • Open Knowledge brand changes
  • OKCon progress
  • Open Knowledge Wiki
  • Reorganising Open Humanities WG

Tasks from last week

  • IE: To write a newsletter piece on Textus for OKF blog – to do
  • IE: Write documentation
  • IE: Create an OKFest proposal – general piece? – done
  • IE: Oxford contacts? – meeting arranged
  • IE: Organise next meeting – done
  • JHS: Email session proposals to the list and ask for champions – Iain for Panton Principles one (ref. gmail account) – done
  • JHS: Email request to list for datasets / tools for the Open Humanities Event – done
  • JHS: Post minutes to OH blog – done
  • ALL: Send availabilities for Open Humanities event to Lieke – Iain can’t do Saturday 27th September
  • ALL: Hunt out some datasets: Open Shakespeare, BL materials… – ?

Update on Textus

Open Knowledge brand changes

OKCon progress

  • Feedback due next month
  • Iain will discuss open humanities principles with Dave de Roure (OERC director) to try and get some momentum ahead of the conference

Open Knowledge Wiki

Refocussing the Working Group

  • Should OpenHumanities appeal to Open Glam? Yes – would allow us to recross the streams, presenting ourselves as ‘OpenGLAM dev’
  • Send out mails to OK Labs and OK GLAM for next month
  • Split the call into housekeeping and an interesting discussion of a project to attract more interest
  • More people we email, more people we get


  • Event in Oxford/online next week to hunt for public domain publications of Nobel prizewinners
  • Next meeting due for 8th May – will be a wiki-updat sprint


  • IE: Circulate info about Oxford event
  • All: Advertise future meetings on Labs list
  • IE&JHS: Create OKF wiki profiles
  • All: Set up a wiki sprint for next meeting
  • JHS: email Iain’s availabilities to Lieke
  • IE: Organise next meeting for 8th May 2014
  • JHS: brand feedback to Katelyn
  • JHS: Publish these minutes

Meeting: 2014-03-20

- March 22, 2014 in Uncategorized


  • James Harriman-Smith
  • Lieke Ploeger
  • Iain Emsley


  • Update on last meeting’s actions
  • OpenGLAM
  • Textus
  • OKFestival

Tasks from last week

  • JHS: Updated the calendar on the blog
  • IE: Pushed all work on Textus so far
  • JHS/JL: Links now displaying, with OKFN, PDR & OpenHum linked on OpenLit
  • ALL: Think about OpenGLAM, humanities collaboration: see below
  • JHS: Librarian will look intoOpenGLAM stuff when time.
  • Lieke: Emailed Open-hum about next GLAM meeting
  • JHS: Put last meeting’s minutes online

OpenGLAM collaboration

  • Open Humanities Hack being planned
  • Planning document:
  • Now in late September – early October: James can be there!
  • There will be talks from last year’s OH award winners


  • Plugin now does what Rufus and Iain thought it should do: now needs to start integrating their separate labours
  • Almost ready for content creation, just needs testing from tech side and documentation


  • James cannot make this as he is moving house in July
  • Ideas:
    • Something about getting people involved? Iainobserves that Oxford has tons of DH projects, but little communication between them: this idea might be better as a session on how to facilitate their openness, attractiveness, and how to make tools produced by digital humanists of interest to practitioners
    • We shoudn’t do a session book-scanning, which has been done to death. However, its popularity is important because it was such an easy activity to get involved in: perhaps we need to look for more activities similar to that?
    • There is also the possibility of joining sessions / combining sessions
    • OpenGLAM, for example, is thinking of sending something brief and general with room ‘open data and cultural heritage’, then: ‘legal issues’, ‘creating GIFs from PD images’, ‘benchmark survey’, + satellite event re: local GLAM groups
    • A good conceptual session for the group would be based on Panton Principles
    • A good practical session could be a ‘single-author’ project: aessessment of how hard is it to find all the data, etc.


  • Another project: make a list of Open Access journals for Digital Humanities / Literature.
  • A big thought: what is reproducible research in the Humanities / Literature? – link with the Panton Principles session? Does it exist or is it useful?

Next meeting

  • Doodle poll will be used to fit a time in week beginning 14th April


  • IE: To write a newsletter piece on Textus for OKF blog
  • IE: Write documentation for Textus
  • IE: Create an OKFest proposal – general piece?
  • IE: Oxford contacts?
  • IE: Organise next meeting
  • JHS: Email session proposals to the list and ask for champions – Iain for Panton Principles one (ref. gmail account)
  • JHS: Email request to list for datasets / tools for the Open Humanities Event
  • JHS: Repost OH awards annoucement when it comes out
  • JHS: Post minutes to OH blog
  • ALL: Send availabilities for Open Humanities event to Lieke
  • ALL: Hunt out some datasets: Open Shakespeare, BL materials…

Meeting: 2014-02-13

- February 14, 2014 in community, minutes


  • James Harriman-Smith (chair)
  • John Levin
  • Iain Emsley
  • Lieke Ploeger


  • Update on post-sprint progress
  • How to help with continuing projects of group
  • Overlap with OpenGLAM
  • Next meeting


  • JHS: remove calendar from website / update it to reflect our activities
  • IE: Push new work on Open Literature to Git once it is working: over the weekend
  • JHS/JL: Try and work out what has happened to the ‘links’ section on WP, and we’ll update it
  • ALL: Think about OpenGLAM and humanities collaboration
  • JHS: Tell English Faculty in Cambridge about OpenGLAM documentation (see links)
  • Lieke: email Open-hum about next GLAM meeting
  • JHS: Put these minutes online
  • IE: Chair next Skype call



  • Iain is trying to get WP plugin for Textus working: he can get stuff out but not into the store. Underlying functions appear to work, just need to find this last little thing. Should be solved soon.
  • Once this plugin funtion is working, annotation store basics will be written. Next step will be the niceties: put, delete (as well as the get and post we already have), but for this we’ll need Rufus, who should be able to test what he’s written.
  • John has done some website maintenance on both Humanities and Open Literature: fixing links and sorting spam
  • Lieke filled us in on the activities of OpenGLAM, and we had some preliminary ideas about potential collaborations. The DM2E project seems a good place for overlap with work of Open Humanities since it involves both institutions and digital humanities researchers, the latter also more representative of our own group members. DM2E is an EU initiative building tools to get researchers to work with manuscripts openly, running workshops in Germany (Berlin) and elsewhere (Vienna), it is due to finish in February 2015 so there is both time and space for collaboration. DM2E events tend to be hands-on, user-based feedback, and perhaps Open Humanities’ members could be interested in participating in such workshops.
  • It was noted that the OpenGLAM network could reallly help OpenHum publicise progress beyond the level of individual researchers.
  • A quick survey set the next Open Humanities Skype call for 20th March 2014. Iain agreed to chair.

This Saturday in London: Open Literature Sprint

- January 23, 2014 in community, Events

Calling all those with an interest in bringing the humanities online:

The Open Knowledge Foundation’s Open Literature project is one dedicated sprint away from being ready to go online. This open platform will offer anyone the opportunity to upload, analyse, present and annotate public-domain texts; it builds on the functionality of the OKF’s recent Open Shakespeare Project to become a tool of use to a great range of scholars in the humanities.

On the day itself we will divide into three teams according to areas of expertise: Coders will be concentrating on the nuts and bolts of the project, dealing with the list of Github issues, creating a text API and ensuring the annotation works on chosen texts: Textfinders will find and write summaries of public domain texts for use on the platform itself; Editors will Proof-read all material posted, from text presentations to technical documentation. By the end of the day we should have a working portal for analysing and sharing works of literature in the public domain!

Key info:

  • When: 25th January 2014, 11am – 6pm (if 11am is too early for you it’s OK to join later!)

  • Who: Anyone interested in literature, philosophy and taking these online

Find out more about the day’s activities here.

Image: The Droeshout portrait of William Shakespeare

DIY Open Book Scanning

- February 15, 2013 in CultureLabs, Featured

PictaPoesis_2RFor the last three weeks, I’ve been leading the beginnings of a book-scanning group here in Cambridge. It all started with a cycle ride through the snow to a glazier’s on the outskirts of the city, where I picked up a few sheets of glass before heading back to our meeting at the English Faculty Library, carefully avoiding every piece of icy ground as I went, as you really do not want to fall off a bike when you have glass strapped to your back. Our method, at least so far, has been very simple. It was inspired by the great DIY Book Scanner website, and needs only a desk lamp, a digital camera, a tripod and some book supports. As we were meeting in a library, the staff kindly lent us some foam book supports and enough extension leads to plug in the lamp. We then propped up a rare copy of a burlesque Hamlet from 1801 in our makeshift cradle, and began, laying the glass sheet on and then photographing each of the right-hand pages, before doing the same with the left. We then put our collection of 80 or so jpegs, suitably renamed and ordered into ScanTailor, which polished our efforts into something fairly respectable. All that was left was to OCR the images, stitch them all into a single PDF and upload to the Internet Archive. 000015 Here, though, we began to hit problems, and any suggestions for a solution would be very welcome indeed:
  • Our images were far from perfect, often distorted due to the slight curvature of the page or the misalignment of the camera on its tripod.
    • Current solution: short of building a rig, we are trying taking photos from above the book, which at least makes it easier to be parallel.
  • Our file sizes were enormous, and this made conversion really time-consuming
    • Current solution: use the university’s copy of Adobe Acrobat to compress the images into B&W PDFs, although it pains me that there seems to be no open-source alternative. Does anyone know of one?.
  • Big file sizes and slightly skewed images do not a good OCR make: we couldn’t get tesseract to run on windows, so resorted to using a web-based version (), with all its limitations.
    • Current solution: again, Adobe to the rescue; but are there any open-source projects out there for this?
And with that list of problems and solutions, you now have a fairly good idea of where we are. If you’re in the area of Cambridge, do get in touch, as we’re always eager for new volunteers. If you want to start your own open DIY book scanning project get in touch on the OpenGLAM mailing list. Future plans include: surveying the English Faculty Library for other books that are out of copyright and not yet digitised (not as numerous as you might think), proposing a collaboration with the Engineering Department for help constructing a standalone book scanner, and investigating what there is to be scanned in the College libraries of the city.

Meeting: 2012-03-21

- March 22, 2012 in community, minutes


James Harriman-Smith Rufus Pollock Sam Leon


  • Feedback from last month’s decisions:
  • Project updates
  • Textus progress report
  • RSC’s My Shakespeare
  • Using Open Shakespeare data for Culture Hack Scotland
  • Open Gravestones
  • Where are Open Shakespeare’s annotations?
  • THAT Camp
  • Hack the Record, National Archives this weekend




Open Gravestones



  • JHS: Have a look at wiki.openliterature spam, and email list about
  • RP/SL: Get Tom to push stuff to GitHub
  • RP: Finish hypernotes
  • JHS: Pass on information about PyBossa, tell Anna to suggest on humanities
  • SL: blog about THAT camp, post link to list
  • SL: Contact Primavera about working group merge
  • SL: Next meeting make sure follow-up with relevant parties
  • SL/JHS: brainstorm topics for Open Humanities – on list (Code dojo? Summer meet?)
  • SL: do a doodle to decide between16th, 17th, or 18th April (all at 5pm) for Open Humanities followed Textus call
  • SL: Close Tcamp11 mailing list not deleted but all users taken off and list no longer advertised, enail addresses saved in case need them again

Meeting: 2012-02-15

- February 17, 2012 in minutes


James Harriman-Smith (JHS) Tod Robbins (TR) Rufus Pollock (RP) Sam Leon (SL) Tom Oinn (TO)


Anna Powell-Smith.



Last Meeting

  • James to follow up on unfinished actions

Project Updates


OKF in the USA and elsewhere


Next Meeting: 21st March 2012, 5pm GMT

Meeting: 2012-01-18

- January 18, 2012 in minutes, News


  • James Harriman-Smith (JHS)
  • Sam Leon (SL)
  • Anna Powell-Smith (APS)
  • Laura James (LJ)
  • Pat Lockley (PL)
  • Iain Emsley (IE)
  • Etienne Posthumus (EP)
  • Rufus Pollock (RP)
  • Laura Newman (LN)


  • Introductions
  • JISC Textus: Start of work?
  • OpenBiblio: Update
  • Open Shakespeare spring clean
  • Shape of the year? Events in the summer, OKCON, etc.



Pat Lockley’s projects:

Domesday Map Update


Open Shakespeare

  • Introduction to Henry VIII
  • New WOTD
  • 300th follower
  • Sub-meeting decides on:
    • Logging issues on github
    • Fix for concordancer issue, discussion re: jscript and CMS plugins
    • (see actions…)


Home page: *
  1. a JISC project running for 6 month projects with JISC funding, starting 1 Feb 2012, in collaboration with Goldsmiths “JISC TEXTUS”
    • project driven by academic needs with close collaboration with a research group
    • integration of existing tools into a platform researchers are using, rather than development of new code.
  2. Creating a platform demonstrating key functionality as a prototype for future TEXTUS:
    • lots more coming eg EU call in spring?
  3. The merging of Shakespeare/Literature platform with Jonathan’s published long term vision :)
Relevant code: IE: @JG Can the Open Correspondence project come under the general platform as well?

Events for Open Humanities

  • Summer get-to-know-Open-Humanities event/hackday
  • Similar to ?
  • EP: Maybe in conjunction with a scheduled event.