Is Open Access Open?

October 26, 2012

This post is cross-posted from Peter’s blog I’m going to ask questions. They are questions I don’t know the answers to – maybe I am ignorant in which case please comment with information, or maybe the “Open Access Community” doesn’t know the answers. Warning: I shall probably be criticized by some of the mainstream “OA Community”. Please try to read beyond any rhetoric. As background, I am well versed in Openness. I have taking a leading role in creating and launching many Open efforts – SAX, Chemical MIME, Chemical Markup Language, The Blue Obelisk, Panton Principles, Open Bibliography, Open Content Mining and helped to write a significant number of large software frameworks (OSCAR, JUMBO, OPSIN, AMI2). I’m on the advisory board of the Open Knowledge Foundation and I have contributed to or worked with Wikipedia, Open Streetmap, Stackoverflow, Open Science Summit, Mat Todd (Open Source Drug Discovery) and been to many hackathons. So I am very familiar with the modern ideology and practice of “Open”. Is “Open Access” the same sort of beast? The features of “Open” that I value are:
  • Meritocracy. That doesn’t mean that decisions are made by hand counting, but it means that people’s views are listened to, and they enter the process when it seems right to the community. That’s happened with SAX, very much with the Blue Obelisk, and the Open Knowledge Foundation.
  • Universality of participation, particularly from citizens without formal membership or qualifications. A feeling of community.
  • A willingness to listen to other views and find means of changing strategy where necessary
  • Openness of process. It is clear what is happening, even if you are not in command.
  • Openness of results. This is universally fundamental. Although there have been major differences of opinion in Free/Open Source Software (F/OSS) everyone is agreed that the final result is free to use, modify, redistribute without permission and for any purpose. Free software is a matter of liberty, not price.
  • A mechanism to change current practice. The key thing about Wikipedia is that it dramatically enhances the way we use knowledge. Many activities in the OKF (and other Open Organisations) are helping to change practice in government, development agencies, companies. It’s not about price restrictions, it’s about giving back control to the citizens of the world. Open Streetmap produces BETTER and more innovative maps that people can use to change the lives of people living right now – e.g. the Haitian earthquake.
How does Open Access measure up against these? Not very well. That doesn’t mean it isn’t valuable, but it means that it doesn’t have obvious values I can align with. I have followed OA for most of the last 10 years and tried to contribute, but without success. I have practiced it by publishing all my own single-author papers over the last 5 years in Gold CC-BY journals. But I have never had much feeling of involvement – certainly not the involvement that I get from SAX or BlueObelisk. That’s a harsh statement and I will elaborate: Open Access is not universal – it looks inward to Universities (and Research Institutions). In OA week the categories for membership are: “click here if you’re a: RESEARCH FUNDER | RESEARCHER/FACULTY MEMBER | ADMINISTRATOR | PUBLISHER | STUDENT | LIBRARIAN” [1] There is no space for “citizen” in OA. Indeed, some in the OA movement emphasize this. Stevan Harnad has said that the purpose of OA is for “researchers to publish to researchers” and that ordinary people won’t understand scholarly papers. I take a strong and public stance against this – the success of Galaxy Zoo has shown how citizens can become as expert as many practitioners. In my new area of phylogenetic trees I would feel confident that anyone with a University education (and many without) would have little difficulty understanding much of the literature and many could become involved in the calculations. For me, Open Access has little point unless it reaches out to the citizenry and I see very little evidence of this (please correct me). There is, in fact, very little role for the individual. Most of the infrastructure has been built by university libraries without involving anyone outside (regrettably, since university repositories are poor compared to other tools in the Open movements). There is little sense of community. The main events are organised round library practice and funders – which doesn’t map onto other Opens. Researchers have little involvement in the process – the mainstream vision is that their university will mandate them to do certain things and they will comply or be sacked. This might be effective (although no signs yet), but it is not an “Open” attitude. Decisions are made in the following ways: * An oligarchy, represented in the BOAI processes and Enabling Open Scholarship (EOS). EOS is a closed society that releases briefing papers and has a members ship of 50 EUR per year and have to be formally approved by the committee (I have represented to several members of EOS that I don’t find this inclusive and I can’t see any value in my joining – it’s primarily for university administrators and librarians). * Library organizations (e.g. SPARC) * Organizations of OA publishers (e.g. OASPA) Now there are many successful and valuable organizations that operate on these principles, but they don’t use the word “Open”. So is discussion “Open”? Unfortunately not very. There is no mailing list with both large volume of contributions and effective freedom to present a range of views. Probably the highest volume list for citizens (as opposed to librarians) is GOAL and here differences of opinion are unwelcome. Again that’s a hard statement, but the reality is that if you post anything that does not support Green Open Access then Stevan Harnad and the Harnadites will publicly shout you down. I have been denigrated on more than one occasion by members of the OA oligarchy (Look at the archive if you need proof). It’s probably fair to say that this attitude has effective killed Open discussion in OA. Jan Velterop and I are probably the only people prepared to challenge opinions: most others walk away. Because of this lack of discussion it isn’t clear to me what the goals and philosophy of OA are. I suspect that different practitioners have many different views, including:
  • A means to reach out to citizenry beyond academia, especially for publicly funded research. This should be the top reason IMO but there is little effective practice.
  • A means to reduce journal prices. This is (one of) Harnad’s arguments. We concentrate on making everything Green and when we have achieved this the publishers will have to reduce their prices. This seems most unlikely to me – any publisher losing revenue will fight this.
  • A way of reusing scholarly output. This is ONLY possible if the output is labelled as CC-BY. There’s about 5-10 percent of this. Again this is high on my list and the only reason Ross Mounce and I can do research into phylogenetic trees.
  • A way of changing scholarship. I see no evidence at all for this in the OA community. In fact OA is holding back innovation in new methods of scholarship as it emphasizes the conventional role of the “final manuscript” and the “publisher”. Green OA relies (in practice) in having publishers and so legitimizes them
And finally is the product “Open”? The BOAI declaration is, in Cameron Neylon’s words, “clear, direct, and precise:” To remind you: “By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.” This is in the tradition of Stallman’s software freedoms, The Open Knowledge Definition and all the other examples I have quoted. Free to use, re-use and redistribute for any lawful purpose. For manuscripts it is cleanly achieved by adding a visible CC-BY licence. But unfortunately many people, including the mainstream OA community and many publishers use “(fully) Open Access” to mean just about anything. Very few of us challenge this. So the result is that much current “OA” is so badly defined that it adds little value. There have been attempts to formalize this, but they have all ended in messy (and to me unacceptable) compromise. In all other Open communities “libre” has a clear meaning – freedom as in speech. In OA it means almost nothing. Unfortunately anyone trying to get tighter approaches is shouted down. So, and this is probably the greatest tragedy, Open Access does not by default produce Open products. For that reason we have set up our own Open-access list in the OKF. If we can have a truly Open discussion we might make progress on some of these issues. [1] Phylogenetic tree diagram by David Hillis, Derreck Zwickil and Robin Gutell.

The Right to Read Is the Right to Mine

June 1, 2012

The following is a draft content mining declaration developed by the Open Knowledge Foundation’s Working Group on Open Access In brief: The Right to Read Is the Right to Mine


Researchers can find and read papers online, rather than having to manually track down print copies.  Machines  (computers) can index the papers and extract the details (titles,  keywords etc.) in order to alert scientists to relevant material.  In addition, computers can extract factual data and meaning by “mining” the content, opening  up the possibility that machines could be used to make connections (and  even scientific discoveries) that might otherwise remain invisible to  researchers. However,  it is not generally possible today for computers to mine the content in papers due to constraints imposed by publishers.  While Open Access (OA) is improving the ability for researchers to read papers (by removing  access barriers), still only around 20% of scholarly papers are OA. The  remainder are locked  behind paywalls. As per the vast majority of subscription contracts, Subscribers may read paywalled papers, but they may not mine them. Content  mining is the way that modern technology locates digital information. Because digitized scientific information comes from hundreds of  thousands of different sources in today’s globally connected scientific  community [2] and because current data sets can be measured in  terabytes,[1] it is often no longer possible to simply read a scholarly  summary in order to make scientifically significant use of such  information.[3]  A researcher must be able to copy information,  recombine it with other data and otherwise “re-use” it so as to produce  truly helpful results.  Not only is it a deductive tool to analyze  research data, it is how search engines operate to allow discovery of content. To prevent mining is therefore to force scientists into blind  alleys and silos where only limited knowledge is accessible.  Science  does not progress if it cannot incorporate the most recent findings and  move forward from there.


‘Open  Content Mining’ means the unrestricted right of subscribers to extract,  process and republish content manually or by machine in whatever form  (text, diagrams, images, data, audio, video, etc.) without prior  specific permissions and subject only to community norms of responsible  behaviour in the electronic age.
  • Text
  • Numbers
  • Tables: numerical representations of a fact
  • Diagrams (line drawings, graphs, spectra, networks, etc.): Graphical  representations of relationships between variables, are images and  therefore may not be, when considered as a collective entity, data.  However, the individual data points underlying a graph, similar to  tables, should be.
  • Images and video (mainly photographic)- where it is the means of expressing a fact?
  • Audio: same as images – where it is expresses the factual representation of the research?
  • XML:  Extensible Markup Language (XML) defines rules for encoding documents  in a format that is both human-readable and machine-readable.”<
  • Core  bibliographic data: described as “data which is necessary to identify  and / or discover a publication” and defined under the Open Bibliography  Principles.
  • Resource  Description Framework (RDF): information about content, such as  authors, licensing information and the unique identifier for the article


Principle 1: Right of Legitimate Accessors to Mine

We assert that there is no legal, ethical or moral reason to refuse to  allow legitimate accessors of research content (OA or otherwise) to use  machines to analyse the published output of the research community.   Researchers expect to access and process the full content of the research literature with their computer programs and should be able to use their machines as they use their eyes. The right to read is the right to mine

Principle 2: Lightweight Processing Terms and Conditions

Mining  by legitimate subscribers should not be prohibited by contractual or  other legal barriers.  Publishers should add clarifying language in  subscription agreements that content is available for information mining by download or by remote access.  Where access is through researcher-provided tools, no further cost should be required. Users and providers should encourage machine processing

Principle 3: Use

Researchers can and will publish facts and excerpts which they discover by reading and processing documents.  They expect to disseminate and aggregate statistical results as facts and context text as fair use excerpts, openly and with no restrictions other than attribution. Publisher  efforts to claim rights in the results of mining further retard the advancement of science by making those results less available to the research community; Such claims should be prohibited. Facts don’t belong to anyone.


We plan to assert the above rights by:
  • Educating  researchers and librarians about the potential of content mining and the current impediments to doing so, including alerting librarians to the need not to cede any of the above rights when signing contracts with  publishers
  • Compiling  a list of publishers and indicating what rights they currently permit,  in order to highlight the gap between the rights here being asserted and  what is currently possible
  • Urging governments and funders to promote and aid the enjoyment of the above rights
[1]  Panzer-Steindel, Bernd, Sizing and Costing of the CERN T0 center, CERN-LCG-PEB-2004-21, 09 June 2004, at [2]  The Value and Benefits of Text Mining, JISC, Report Doc #811, March 2012, Section 3.3.8 at,  citing P.J.Herron, “Text Mining Adoption for Pharmacogenomics-based  Drug Discovery in a Large Pharmaceutical Company: a Case STudy,”  Library, 2006, claiming that text mining tools evaluated 50,000 patents  in 18 months, a task that would have taken 50 person years to manually.
[3] See MEDLINE® Citation Counts by Year of Publication, at and National Science Foundation, Science and Engineering Indicators: 2010, Chapter 5 at asserting the annual volume of scientific journal articles published is on the order of 2.5%.