Meeting: 2014-09-01
James Harriman-Smith - September 6, 2014 in minutes
PRESENT
- James Harriman-Smith
- John Levin
- Seth Woodward
- Eric Hellman
AGENDA
- Project GITenberg
GITENBERG
Reading Material
- Full listing of books is: https://github.com/GITenberg
- Python scripts make and upload book repos: https://github.com/sethwoodworth/GITenberg
- GITenberg mailing list: https://groups.google.com/forum/#!forum/gitenberg-project
- HackerNews thread: https://news.ycombinator.com/item?id=8214564
- Eric’s project: http://unglue.it
- Overview of file formats of PG books https://github.com/sethwoodworth/GITenberg/blob/master/docs/file_endings_freq
Presentations
- Seth: Give overview of where we are (see reading material above)
- James: PhD student working on ‘reception’ studies, interested in GITenberg as text transmission
- John: PhD student, likes very large catalogs of text, lit/digital-humanities, wants large groups of text (BALZAC)
- Eric Hellman: behind unglue.it – license conversion -> collect money -> release openly: when books are free, the book distribution system breaks; even libraries don’t want free books, don’t know how to deal with them or handle them; interested in out-of-copyright books; have been making the store/resource/service for free books; became familiar with PG’s lack; created GitenHub, found GITenberg, joined forces. Believes that the posibilities of version controlled archive, should be the way ebooks should go.
Questions:
- James: GIRenberg will both allow generation of books (once tools are in place), and allow version control, altering sources of publication. Two sides, but what about peculiarities of text, you need vc to track those things, yet what happens when there is a bad ‘correction’?
- Seth: Do we want to follow Gutenberg’s policies? They have concept of ‘editionless’ edition, roughly based on a specific printing, but not a precise reproduction… GITenberg might be better seen as creating our own editions, one that could be printed/distributed as a ‘reasonable’ copy
- Seth: NB: In order to produce ebooks (and even print) it has to go through an intermediate html / CSS step
- Eric: not just version control, but the forking and merging process too: we can keep multiple branches of the same work, efficiently pull in changes from one branch to another; no reason we couldn’t maintain a branch which is faithful to a given copt from 1915 along with a book that doesn’t have the mistake of ‘with with’
- James: but what about authority of a fork?
- Seth: Authority on GitHub is distinguishable, number of followers
- Seth: Authority on GitHub is distinguishable, number of followers
- John: how much of this is transferrable? GITenberg is great, how does it go further? e.g. with the big bunch of 18thc texts released from ECCO database by Gale-Cengage (http://www.textcreationpartnership.org/tcp-ecco/, and http://quod.lib.umich.edu/e/ecco/)
- Seth: yes: very extendable, as much as possible
- Seth: yes: very extendable, as much as possible
Seth: a big issue is the applicability to other texts: yes. Archive.org has become a clearing-house for scanned books, but it has no proof-reading process, and would be a great place to extend GITenberg in the future.
Seth: another big question: which formats?
- Markdown: NEG: not standardized, no pagination, no footnotes, and can only be improved by making our own bespoke version, which would have to be maintained. POS: Penflip, etc. are using it, so might have scope…
- Restructured text: has a spec, is extensible, Distributed Proofreaders use it, but has only one complete implementation in python, urls are clunky, it seems to being sidelined…
- TEI: lack of software, doesn’t seem to be a lot of to and fro, big ask
- Responses:
- Eric: TEI is probably overkill for everything; you need just enough markup to produce deliverables, like ootnotes, ToC
- James: further useful questions here: how much support does each format have? Can we test formats?
- Seth: Another big question: annotations
- Seth has done work on annotator, cares for textual annotation a great deal. Annotation is the obvious win for a digital book, since such works have infinite margins and can take a variety of forms
- Demo for some of the more exciting things that can be done: https://developer.mozilla.org/en-US/demos/detail/html5-audio-read-along/launch. Such projects imply a potential later connection with librivox.
- This is more of a question for the developers, although the utility of line numbers is already clear
Other Observations
- Seth: ePub v.3 is very cool, but annotation on ePub readers looks to be very tricky…
- John: re: Internet Archive OCR: possibility of machine correction of OCR’d text for IA books, but they seem nonplussed. This is the kind of thing that Gitenberg could help with
NEXT STEPS
- Seth: Publish actionable tasks for more developers
- Seth: Shift Gitenberg to GitLab
- Everyone: continue to publicise this project