You are browsing the archive for Interviews.

Podcast: Pavel Richter on the value of open data

- August 25, 2017 in Interviews, Open Knowledge, podcasts

This month Pavel Richter, CEO of Open Knowledge International, was interviewed by Stephen Ladek of Aidpreneur for the 161st episode of his Terms of Reference podcast. Aidpreneur is an online community focused on social enterprise, humanitarian aid and international development that runs this podcast to cover important topics in the social impact sector. Under the title ‘Supporting The Open Data Movement’, Stephen Ladek and Pavel Richter discuss a range of topics surrounding open data, such as what open data means, how open data can improve people’s lives (including the role it can play in aid and development work) and the current state of openness in the world. As Pavel phrases it: “There are limitless ways where open data is part of your life already, or at least should be”. Pavel Richter joined Open Knowledge International as CEO in April 2015, following five years of experience as Executive Director of Wikimedia Deutschland. He explains how Open Knowledge International has set its’ focus on bridging the gap between the people who could make the best use of open data (civil society organisations and activists in areas such as human rights, health or the fight against corruption) and the people who have the technical knowledge on how to work with data. OKI can make an impact by bridging this gap, empowering these organisations to use open data to improve people’s lives. The podcast goes into several examples that demonstrate the value of open data in our everyday life, from how OpenStreetMap was used by volunteers following the Nepal earthquake to map where roads were destroyed or still accessible, to governments opening up financial data on tax returns or on how foreign aid money is spent, to projects such as OpenTrials opening up clinical trial data, so that people are able to get information on what kind of drugs are being tested for effectiveness against viruses such as Ebola or Zika. In addition, Stephen Ladek and Pavel Richter discuss questions surrounding potential misuse of open data, the role of the cultural context in open data, and the current state of open data around the world, as measured in recent initiatives such as the Open Data Barometer and the Global Open Data Index. Listen to the full podcast below, or visit the Aidpreneur website for more information:  

How participatory budgeting can transform community engagement – An interview with Amir Campos

- June 2, 2017 in Interviews, OpenBudgets, OpenSpending

For most municipalities, participatory budgeting is a relatively new approach to include their citizens directly in the decision making for new investments and developments in their community. Fundación Civio is a civic tech organisation based in Madrid, Spain that develops tools for citizens that both reveal the civic value of data and promote transparency. The organisation has developed an online platform for participatory budgeting processes, both for voting and monitoring incoming proposals, that is currently being tested in three Spanish municipalities. Diana Krebs (Project Manager for Fiscal Projects at OKI) talked with Amir Campos, project officer at Fundación Civio, on how tech solutions can help to make participatory budgeting a sustainable process in communities and what is needed beyond from a non-tech point of view.

Amir Campos, Project officer at Fundación Civio

Participatory budgeting (PB) is a relatively new form for municipalities to engage with their citizens. You developed an online platform to help to make the participatory process easier. How can this help in order to turn PB in an integrative part of community life? Participatory budgets are born with the desire to democratise power at a local level, to “municipalise the State”, with a clear objective, that these actions at local level serve as an example at a regional and national level and foster change in State participation and investment policies. This aim for the democratisation of power also represents a struggle for a better distribution of wealth, giving voice to the citizens, taking them out of political anonymity every year, making local investment’s needs visible much faster than any traditional electoral process. Participatory budgeting is a tough citizen’s marking of their local representatives. The tool we have designed is powerful but easy to use because we have avoided the development of a tool that only technical people would use. Users are able to upload their own data (submitting or voting proposals, comments, feedback, etc. in order to generate discussions, voting processes, announcements, visualisations, etc.) It has a more visual approach that clearly differentiates our solution from existing solutions and gives further value to it. Our tool is targeted at administrators, users and policy makers without advanced technical skills and it is online, presented as Software as a Service (SaaS), avoiding the need for users to download or install any special software. All in all, out tool, will bring the experience of taking part in a process of participatory budgeting closer to all citizens. Once registered, its user-friendliness and visual features will keep users connected, not only to vote proposals but also to monitor and share them, while exercising effective decision-making actions and redistributing available resources in their municipality. Along with off-line participatory processes, this platform gives voice to citizens, vote and also gives them the possibility of making their public representatives more accountable through its monitoring capabilities. The final aim is to enable real participatory experiences, providing solutions that are easy to implement by all stakeholders involved, thus strengthening the democratic process.

Do you think that participatory budgeting is a concept that will be more successful in small communities, where the daily business is less ruled by political parties’ interest and more by consent of what the community needs (like new playgrounds or sports parks)? Or can it work in bigger communities such as Madrid as well? Of course! The smaller the community, the better the decision-making process, not only at the PB level but at all levels. Wherever there is a “feeling” of a community it is much easier to generate agreements oriented towards the common good. That is why in large cities there are always more than one PB process at the same time, one at the neighborhood level, and another at the municipal level (whole city), to engage people at the neighborhood level and push them to vote at the city level. Examples such as Paris or Madrid, which use on-line and off-line platforms use that division, instead, small town halls, such as Torrelodones, open just a single process for the whole municipality. All process need municipal representatives commitment and citizens engagement, connected to a culture of participation, for harvesting successful outcomes. Do you see a chance that PB might increase fiscal data literacy if communities are more involved in deciding on what the community should spend tax money on? Well, I am not sure about an improvement on fiscal data literacy, but I am absolutely convinced that citizens will better understand the budget cycle, concepts and the overall approval process. Currently, in most cases, budget preparation and approval has been a closed-door process within administrations. Municipal PB implementations will act as enabling processes for citizens to influence budget decisions, becoming actual stakeholders of the decision-making process and auditing budget compromised vs. actual spending and giving feedback to the administrations. Furthermore, projects implemented thanks to a PB will last longer since citizens will take on a commitment to the project implemented, their representatives and their peers with whom individuals will have to agree once and will easily renew this agreement. The educational resources available for citizens in the platform will help also to improve the degree of literacy. They provide online materials to better understand the budget period, terms used or how to influence and monitor the budget. What non-tech measures and commitments do a municipal council or parliament need to take so that participatory budgeting will become a long-term integrative part of citizens’ engagement? They will have to agree as a government. One of the key steps to maintain a Participatory Budgets initiative over time is to legislate on this so that, regardless of the party that governs the municipality, the Participatory Budgeting processes keep running and a long-lasting prevalence is achieved. Porto Alegre (Brazil) is a very good example of this; they have been redistributing their resources at the municipal level for the last 25 years. Fundación Civio is part of the EU H2020 project openbudgets.eu, where it collaborates with 8 other partners around topics of fiscal transparency.  

Storytelling with Infogr.am

- October 21, 2014 in Community Session, Events, infogram, Interviews, skillshare

infogram As we well know, Data is only data until you use it for storytelling and insights. Some people are super talented and can use D3 or other amazing visual tools, just see this great list of resources on Visualising Advocacy. In this 1 hour Community Session, Nika Aleksejeva of Infogr.am shares some easy ways that you can started with simple data visualizations. Her talk also includes tips for telling a great story and some thoughtful comments on when to use various data viz techniques.
We’d love you to join us and do a skillshare on tools and techniques. Really, we are tool agnostic and simply want to share with the community. Please do get in touch and learn more: about Community Sessions.

Nicolas Kayser-Bril – on doing journalism in the digital era

- September 2, 2014 in Data Journalism, Interviews, tools

Journalism++ is an agency for data-driven storytelling. Started by three people, it is now a network of independent for-profit companies working from Berlin, Paris, Stockholm, Cologne, Amsterdam and Porto. They define journalism as ‘making interesting what is important’, not ‘making important what is interesting’. Winner of Data Journalism Awards 2014, the agency is famous for building data-driven web-apps, creating visualizations, and, last but not least, investigative journalism projects. How cool is that? We discuss it with a CEO and co-founder of J++, Nicolas Kayser-Bril. A self-taught programmer and journalist, Nicolas holds a degree in Media Economics. He is also investing in data journalism as instructor at massive open online courses taught in English and French.   Nicolas, how did you decide that it’s a time to open a data journalism agency? Before, together with Pierre (Pierre Romera, Chief Technology Officer and Developer at J++), we worked at the news start up called OWNI in Paris. OWNI is a legendary French newsroom launched in April 2009, where Nicolas was a head of the data journalism team. It was focusing on technology, politics and culture and running on a non-profit economic model. Twice a winner of Online Journalism Awards, OWNI is probably most famous for  its cooperation with Wikileaks .Due to financial problems, OWNI was closed down on 21 December 2012. As things at OWNI get worse, we left it at 2011 and wanted to keep working together. He is a developer, I am a journalist, so we looked for different newsrooms in Paris and London, who might be willing to hire us as a team. Some newsrooms told us: ok, you can come and work for us, but the developer is going to work with developers, and the journalist is going to work journalists, which we refused. So that’s why we created the company: it was more a plan B, but we just wanted to keep doing data journalism together. Why did you open up in Berlin and how did the name come up? The name was a nerdy joke: when you code and you add ‘++ ‘, it means that this variable is now equal to the value of the variable plus one, so basically ‘journalism ++’ means’ journalism equals journalism +1’. So, something more than just journalism. As for the city, again, it was not planned, but just happened. At that time I was living in Berlin, and Pierre was living in London. Since both Pierre and me are French, we first created a company in Paris due to the legal reasons, and we actually planned on going back in Paris. But at some point our Head Project Manager Anne-Lise Bouyer decided to move to Berlin, and we created this way of working between two cities. Now it’s seven people here. But aside from seven people in the core team, you do have the branches. How did you develop this and what’s the rule to be accepted in the club? This franchise program developed again kind of randomly. We knew Jens Finnas and Peter Grensund from Sweden, they were very good and we heard that they were going to open an agency. So we just told them, that it would be cool to have the same name, and to create this franchise concept. Since it worked really well with J++ Stockholm, we expanded to the other cities. The idea is to bring together the best data-driven journalists in every market, so we are looking for really good developers mostly, because we believe, that DDJ is really a technology thing. And then you need to come up with the concept, to show that you are not just doing things, but that you have a plan to create a company.  We do not want to make our brand kind of a label that we give or don’t give to people. This is more about creating the companies, because this makes you much stronger in your journalism investigations.   What’s you favourite own project so far? Right now we are pivoting towards Detective.io. Detective.io is an open-source tool developed by the Journalism++. It lets users to upload, store and mine the data you have on a precise topic. Therefore, it is a useful platform to host an investigation, be this done by journalists, lawyers or business intelligence. To start working, you need to download the source code and install Detective.io on your own server. Alternatively, you can do everything online on detective.io which is much simpler and exactly as safe.  detective We run some of our investigative projects to advertise this tool, such as The Migrant Files or the Belarus Networks (a database of connections within the Belarussian elite, to be published soon). Pushing Detective.io to the new markets, we invest in the investigative journalism as a field. We are also going to provide the customization services to this. Before that there was something really cool that we did for the Arte at the beginning of this year. The special thing about this was that Arte came to us and said: we want to do something about the employment and the work situation of young people in Europe, so we did the project from the concept to the development. This was called World of Work. Which is just another evidence of how many shapes can journalism take nowadays, since this project was a questionnaire? Exactly, in this case we wanted the young people who would take the questionnaire  to ask themselves the questions about the work that they might not have asked themselves otherwise. It was 60 questions on various topics, and we pretended that it was a survey, but actually the idea was to make users think in new ways about their work situation. For instance, something that we did was to never talk about unemployment or employment, because we believe these categories never work for the younger generation. The reason I am very happy about this is that people who took the questionnaire told us that some of the questions they would never have asked themselves.  And this really made them think, and this precisely was the goal, even if we they did not realize that it was the goal that we had. wow Did you get a psychologist in the team for this project? We hired a consulting from the advertising to create the atmosphere. Generally, it was a huge amount of research, and a multi-skilled team including developer and designer. But the ‘other’ skill we had to put to the project was from advertising because it’s surprisingly hard to write for 22-30 years group and to find the questions that were interesting for the user and relevant for the project.   What are currently your favourite tools? Apart from Detective.io, when it comes to simple visualizations I use Datawrapper (another project of J++, which is run by J++ Cologne) or Chartbuilder (project by Quartz). You have so many tools, and also it really depends from project to project, for example right now we do social network visualization, so we use Gephi a lot – open source social network analysis software. What would advice to people who want to promote DDJ in Russia? You have lots of the agencies doing cool stuff there, and I think it’s important to have good developers who understand something about journalism, and that’s how you can have the good ideas, because otherwise you would just mimic what’s happening in the other countries, but that’s not the point.  The point is to leverage technology, to bring something new to the field. And in this case either you learn how to code or you find a developer.   That said, data driven journalism sometimes looks like an inner thing, a nerdy direction of journalism. Do you think it can become a mainstream? What you say is especially true, with what was created in the US this year, like Upshot,Vox and 538, and I agree, it might shift the definition of DDJ to the nerdy field. But if you consider the data journalism as a journalism which could not be done without a computer – which is the definitions I favour – then you can do anything. Like in the project  ‘World of work’ I was referring to: what’s special about this, is that it has been done by journalists working with developers.  But as a user, you don’t realize it. And that’s what we should aim at.   Is then data journalism something that we have been known before, under different names, like computer-assisted reporting, or there’s something particularly new about the data journalism as we know it today? It’s true that using computers for journalism is nothing new, same thing with visualization of data, but what’s new, especially in Europe is that people in the newsrooms have realized that they need some math to do their work. Before if you wanted to do a piece of computer assisted reporting, you needed to have a statistician, you had to go out and rent computer, and computer time was extremely expensive. Now you can do same kind of analysis in a few hours for 0 euros. And then anyone can do it and publish. And there is also another aspect: the term ‘online journalism’ has been hijacked by the people doing copy paste journalism. So that’s also one of the main reasons why data journalism is fashionable now: it means doing journalism online in a different way than it was done for the past 10 years.   The profession of journalism as a whole is very much shifting nowadays, where so many things of journalism are being done by citizen journalists or bloggers, and in the same time the ‘real’ journalists have to acquire the skills they were never asked before. As a self-taught journalist, what is your stake on this? That’s a very interesting topic and we might talk about it a few hours. The definition of journalism or who the journalist was before the digitalization of content was really a definition by the means. The journalist was the person who had access to the means of publication or broadcast. And that’s why the anchor on Russian TV is a journalist even if she’s just repeating what the government wants her to say. And the investigative journalist at the New York Times is also a journalist, because they both have access to the means of communication to the public. And this concept doesn’t exist anymore. Now everybody can publish. And what you see, when it was Ben Laden assassination, you had this guy in Pakistan, he had no contact to any media outlet whatsoever, he just published a couple of tweets and then he was for a few minutes the leading news source on the topic. Take another example, at Aurora shooting in Colorado, when there was this guy dressed as Superman who came to the movie theatre in Denver, Colorado, and shooted everyone, and  for the first 10-12 hours the main source of information was this teenager in his room close to the shooting. It was at night, so it took like 10 hours for the TV stations to come on location, and during this time he was the one checking the information and just doing journalism. And when he was asked why he did it, he just said: I thought it was needed. Next, there was a visualization with all the drones strike, it was just a nice visualization, there was no breaking news, it was more like a cold journalism thing. But it became extremely successful, and again, the reason behind it, as the author said – this story has not been told, I though it needs to be told. drones What all these things have in common, is adding value to the information in the public interest. And I think, this is what constitutes journalism today. If you are working full time in the news room writing articles or presenting the news at the TV, I would call it not journalism but information professional. That’s why I define journalism not by the occupation of the people who do it, but really by the goal, telling info in the public interest. And anybody can do the act of journalism.       Team photo: © Marion Kotlarski/Journalism++

#OKStory

- July 9, 2014 in Events, Ideas and musings, Interviews, network, OKFest, OKFestival, Open Knowledge Foundation

Everyone is a storyteller! Just one week away from the big Open Brain Party of OKFestival. We need all the storytelling help you can muster. Trust us, from photos to videos to art to blogs to tweets – share away. The Storytelling team is a community-driven project. We will work with all participants to decide which tasks are possible and which stories they want to cover. We remix together. We’ve written up this summary of how to Storytell, some story ideas and suggested formats. There are a few ways to join:
  • AT the Event: We will host an in person meetup on Tuesday, July 15th to plan at the Science Fair. Watch #okstory for details. Look for the folks with blue ribbons.
  • Digital Participants: Join in and add all your content with the #okfest14 @heatherleson #OKStory tags.
  • Share: Use the #okstory hashtag. Drop a line to heather.leson AT okfn dot org to get connected.
We highlighted some ways to storytell in this brief 20 minute chat:

Community Session: Open Data Hong Kong

- May 7, 2014 in Community Sessions, Events, Interviews, OKF Hong Kong, Open Knowledge Foundation Local Groups

Open Data Hong Kong is an open, participative, and volunteer-run group of Hong Kong citizens who support Open Data. Join Mart van de Ven, Open Knowledge Ambassador for Hong Kong, and Bastien Douglas of ODHK for a discussion about their work. odhk - logo

How to Participate

This Community Session will be hosted via G+. We will record it.
  • Date: Wednesday, May 14, 2014
  • Time: Wednesday 21:00 – 22:00 EDT/ Thursday 09:00 – 10:00 HKT/01:00 – 02:00 UTC
  • See worldtimebuddy.com to convert times.
  • Duration: 1 hour
  • Register for the event here

About our Community Session Guests

Mart van de Ven Mart van de Ven co-founded Open Data Hong Kong to inspire and nurture a techno-social revolution in Hong Kong. He believes Open Data is a chance for citizens to be better served by government. Not only because it enables greater transparency and accountability, but because when governments open up their data it allows them to concentrate on their irreducible core – enabling us as citizens. He is also Open Knowledge’s ambassador to Hong Kong, a data-driven developer and technology educator for General Assembly.
Bastien Douglas
Bastien’s role with ODHK is to create a structure for the community to develop sustainability, form partnerships with other organisations and operationalize projects to achieve the goals of the organisation. Bastien’s background combines public sector experience, research analysis and citizen engagement. For over 4 years as a public servant in the federal government of Canada in Ottawa, he analysed policy at the front lines of policy development and researched public management issues at the centre of the bureaucracy. In 2009, a community of innovative public servants formed by Bastien to work across silos using collaborative tools and social media pushed projects for to forward Open Data to raise capacity to share knowledge and better support the public. Bastien then worked in the NGO sector building knowledge capacity for the immigrant-serving sector, while supporting advocacy for improved services, information-sharing, access to resources and sharing of practices for service delivery. Bastien Douglas on Twitter
More Details
See the full Community Session Schedule

Vice Italy interview with the editor of the Public Domain Review

- January 28, 2013 in Interviews, Public Domain, public domain review

The editor of The Public Domain Review, Adam Green, recently gave a feature-length interview to Vice magazine Italy. You can find the original in Italian here, and an English version below! While there is a wealth of copyright-free material available online, The Public Domain Review is carving out a niche as strongly curated website with a strong editorial line. How did the PDR begin? Myself and The Public Domain Review’s other co-founder, Jonathan Gray, have long been into digging around in the these huge online archives of digitised material – places like the Internet Archive and Wikimedia Commons – mostly to find things with which to make collages. We started a little blog called Pingere to share some of the more unusual and compelling things that we stumbled across. Jonathan suggested that we turned this into a bigger project aiming to celebrate and showcase the wonderfulness of this public domain material that was out there. We took the idea to the Open Knowledge Foundation, a non-profit which promotes open access to knowledge in a variety of fields, and they helped us to secure some initial seed funding for the project. And so the Public Domain Review was born. What was the first article you posted? We initially focused on things which were just coming into the public domain that year. In many countries works enter the public domain 70 years after the death of the author or artist – although there are lots of weird rules and exceptions (often unnecessarily complicated!). Anyway, 2011 saw the works of Nathaniel West enter the public domain, including his most famous book Day of the Locusts. The first article was about that, and West’s relationship with Hollywood, written by Marion Meade who’d recently published a book on the subject. What criteria do you use to choose stuff for the Review? As the name suggests, all our content is in the ‘public domain’, so that is the first criterion. We try to focus on works that are in the public domain in most countries, which isn’t as easy as it sounds as every country has different rules. Generally it means stuff created by people who passed away before the early 1940s. The second criterion is that there are no restrictions on the reuse of the digital copies of the public domain material. What kind of restrictions? Well, some countries say that in order to qualify for copyright digital reproductions have to demonstrate some minimal degree of originality, and others say that there just needs to be demonstrable investment in the digitisation (the so-called “sweat of the brow” doctrine). Many big players in the world of digitisation – like Google, Microsoft, the Bridgeman Art Library, and national institutions – argue that they own rights in their digital reproductions of works that have entered the public domain, perhaps so they can sell or restrict access to them later down the line. We showcase material from institutions who have already decided to openly license their digitisations. We are also working behind the scenes to encourage more institutions to do the same and see free and open access to their holdings as part of their public mission. But you have a strong aesthetic line as well, don’t you? Yes of course, the material has to be interesting! We tend to go for stuff which is less well known, so rather than put up all the works of Charles Dickens (as great as they are) we’ll go instead for something toward the more unorthodox end of the cultural spectrum, e.g. a personal oracle book belonging to Napoleon, or a 19th century attempt to mathematically model human consciousness through geometric forms. I guess a sort of alternative history to the mainstream narrative, an attempt to showcase just some of the excellence and strangeness of human ideas and activity which exist ‘inbetween’ these bigger events and works about which the narrative of history is normally woven. Is there anything you wouldn’t publish? I guess there is some material which is perhaps a little too controversial for the virtuous pages of the PDR – such as the racier work of Thomas Rowlandson or some of the less family friendly works of the 16th century Italian printmaker Agostino Carracci. Our most risque thing to date is probably a collection of some of Eadweard Muybridge’s ‘animal locomotion’ portfolio, which included a spot of naked tennis. It seems that authors are becoming less and less important, publishers are facing extinction, and yet the potential for users of content is ever-expanding. What do you think about the future of publishing? It is certainly true that things are radically changing in the publishing world. Before the advent of digital technologies, publishers were essentially gatekeepers of what words were seen in the public sphere. You saw words in books and newspapers and – for many people – that was pretty much it. What you saw was the result of decisions made by a handful of people. But now this has changed. People don’t need publishing contracts to get their words seen. Words, pictures and audiovisual material can be shared and spread at virtually no cost with just a few clicks. But people still do want to read words in books. And they turn to publishers – through bookshops, the media, etc – to find new things to read. While there is DIY print-on-demand publishing, it is hard to compete with the PR and promotion of professional publishers. I don’t think publishers will become extinct. No doubt they will adapt to new markets in search for profits. Is the internet causing works to become more detached from their authors? Is there a way in which this could be a good thing? With the rise of digital technologies it is, no doubt, much easier for this detachment to happen. Words leave the confines of books and articles, get copied and pasted into blogs, websites and social media, are shared through illegal downloads, etc, perhaps losing proper attribution along the way. But in a way none of this is new. It is just a more accelerated version of what has happened for hundreds of years. If anything it is probably better for authors now than it was with the past – as the internet also enables people to try to check where things come from, their pedigree and provenance. In the 17th century, before there was a proper copyright law, it was common for whole books to be “stolen”, given a new title and cover, and be sold under a new author’s name. Could this be a good thing? Well, one could argue that reuse and reworking are an essential part of the creative process. We can find brilliant examples of literary pastiche and collaging techniques in the works of writers like W.G. Sebald, where you are not sure whether he’s speaking with his own words or that of another writer (whose work he is discussing). In Sebald’s case it gives the whole piece a fluency and unity, a sense that its one voice, of humanity or history speaking. But of course Sebald’s work is protected by copyright held by his publishers or his literary estate. One wonders whether one could use his works in the same way and get away with it. So is copyright a big negative? No not at all – from the perspective of artists/writers copyrighting their work, in general it makes complete sense to me. This is not just about money but also about artistic control over how a work is delivered. Looking back to the past before copyright – it wasn’t just about royalties but also about reputation, about preventing or discouraging mischievous or sloppy reuse. While copyright is far from perfect – and often pretty flawed – it still offers creators a basic level of protection for the things that they have created. As an author or artist if you want something more flexible than your standard copyright license then you can combine it with things like Creative Commons licenses to say how you want others to be able to use your works. The question of how long (or whether!) works should be copyrighted after the death of creators is an entirely different question. I think copyright laws and international agreements are currently massively skewed in favour of big publishers and record companies (often supported by well heeled lobbyist groups purporting to serve the neglected interests of famous authors and aging rock stars), and do not take sufficient account of the public domain as a positive social good: a cultural commons, free for everyone. Have you ever had problems with a copyright claim from an author? Well almost all of the public domain material we feature is by people who are long dead, so we haven’t (thank god!) had any direct complaints from them. We did get one take down notice on Gurideff’s Harmonium Recordings. The law can get very complex, particularly around films and sound recordings. I am not sure they were right, but we took it down all the same. What are your plans for the future? As well as expansion of the site with exciting new features we are also planning to break out from the internet into the real world of objects! We’re planning to produce some beautiful printed volumes with collections of images and texts curated around certain themes. We’ve wanted to do this for a while, and hopefully we’ll have time (and funds!) to finally do this next year. You can sign up to The Public Domain Review’s wonderful newsletter here

“Carbon dioxide data is not on the world’s dashboard” says Hans Rosling

- January 21, 2013 in Featured, Interviews, OKFest, Open Data, Open Government Data, Open/Closed, WG Sustainability, Working Groups

Professor Hans Rosling, co-founder and chairman of the Gapminder Foundation and Advisory Board Member at the Open Knowledge Foundation, received a standing ovation for his keynote at OKFestival in Helsinki in September in which he urged open data advocates to demand CO2 data from governments around the world. Following on from this, the Open Knowledge Foundation’s Jonathan Gray interviewed Professor Rosling about CO2 data and his ideas about how better data-driven advocacy and reportage might help to mobilise citizens and pressure governments to act to avert catastrophic changes in the world’s climate.
Hello Professor Rosling! Hi. Thank you for taking the time to talk to us. Is it okay if we jump straight into it? Yes! I’m just going to get myself a banana and some ginger cake. Good idea. Just so you know: if I sound strange, it’s because I’ve got this ginger cake. A very sensible idea. So in your talk in Helsinki you said you’d like to see more CO2 data opened up. Can you say a bit more about this? In order to get access to public statistics, first the microdata must be collected, then it must be compiled into useful indicators, and then these indicators must be published. The amount of coal one factory burnt during one year is microdata. The emission of carbon dioxide per year per person in one country is an indicator. Microdata and indicators are very very different numbers. CO2 emissions data is often compiled with great delays. The collection is based on already existing microdata from several sources, which civil servants compile and convert into carbon dioxide emissions. Let’s compare this with calculating GDP per capita, which also requires an amazing amount of collection of microdata, which has to be compiled and converted and so on. That is done every quarter for each country. And it is swiftly published. It guides economic policy. It is like a speedometer. You know when you drive your car you have to check your speed all the time. The speed is shown on the dashboard. Carbon dioxide is not on the dashboard at all. It’s like something you get with several years delay, when you are back from the trip. It seems that governments don’t want to get it swiftly. And when they publish it finally, they publish it as total emissions per country. They don’t want to show emission per person, because then the rich countries stand out as worse polluters than China and India. So it is not just an issue about open data. We must push for change in the whole way in which emissions data is handled and compiled. You also said that you’d like to see more data-driven advocacy and reportage. Can you tell us what kind of thing you are thinking of? Basically everyone admits that the basic vision of the green movement is correct. Everyone agrees on that. By continuing to exploit natural resources for short term benefits you will cause a lot of harm. You have to understand the long-term impact. Businesses have to be regulated. Everyone agrees. Now, how much should we regulate? Which risks are worse, climate or nuclear? How should we judge the bad effects of having nuclear electricity? The bad effects of coal production? These are difficult political judgments. I don’t want to interfere with these political judgments, but people should know the orders of magnitude involved, the changes, what is needed to avoid certain consequences. But that data is not even compiled fast enough, and the activists do not protest, because it seems they do not need data? Let’s take one example. In Sweden we have data from the energy authority. They say: “energy produced from nuclear”. Then they include two outputs. One is the electricity that goes out into the lines and that lights the house that I’m sitting in. The other is the warm waste water that goes back into the sea. That is also energy they say. It is actually like a fraud to pretend that that is energy production. Nobody gets any benefit from it. On the contrary, they are changing the ecology of the sea. But they get away with it as the destination is energy produced. We need to be able to see the energy supply for human activity from each source and how it changes over time. The people who are now involved in producing solar and wind produce very nice reports on how production increase each year. Many get the impression that we have 10, 20, 30% of our energy from solar and wind. But even with fast growth from almost zero solar and wind it is nothing yet. The news reports mostly neglect to explain the difference in percentage growth of solar and wind energy and their percent of total energy supply. People who are too much into data and into handling data may not understand how the main misconceptions come about. Most people are so surprised when I show them total energy production in the world on one graph. They can’t yet see solar because it hasn’t reached one pixel yet. So this isn’t of course just about having more data, but about having more data literate discussion and debate – ultimately about improving public understanding? It’s like that basic rule in nutrition: Food that is not eaten has no nutritional value. Data which is not understood has no value. It is interesting that you use the term data literacy. Actually I think it is presentation skills we are talking about. Because if you don’t adapt your way of presenting to the way that people understand it, then you won’t get it through. You must prepare the food in a way that makes people want to eat it. The dream that you will train the entire population to about one semester of statistics in university: that’s wrong. Statisticians often think that they will teach the public to understand data the way they do, but instead they should turn data into Donald Duck animations and make the story interesting. Otherwise you will never ever make it. Remember, you are fighting with Britney Spears and tabloid newspapers. My biggest success in life was December 2010 on the YouTube entertainment category in the United Kingdom. I had most views that month. And I beat Lady Gaga with statistics. Amazing. Just the fact that the guy in the BBC in charge of uploading the trailer put me under ‘entertainment’ was a success. No-one thought of putting a trailer for a statistics documentary under entertainment. That’s what we do at Gapminder. We try to present data in a way that makes people want to consume it. It’s a bit like being a chef in a restaurant. I don’t grow the crop. The statisticians are like the farmers that produce the food. Open data provide free access to potatoes, tomatoes and eggs and whatever it is. We are preparing it and making a delicious food. If you really want people to read it, you have to make data as easy to consume as fish and chips. Do not expect people to become statistically literate! Turn data into understandable animations. My impression is that some of the best applications of open data that we find are when we get access to data in a specific area, which is highly organized. One of my favorite applications in Sweden is a train timetable app. I can check all the communter train departures from Stockholm to Uppsala, including the last change of platform and whether there is a delay. I can choose how to transfer quickly from the underground to the train to get home fastest. The government owns the rails and every train reports their arrival and departure continuously. This data is publicly available as open data. Then a designer made an app and made the data very easy for me to understand and use. But to create an app which shows the determinants of unemployment in the different counties of Sweden? No-one can do that because that is a great analytical research task. You have to take data from very many different sources and make predictions. I saw a presentation about this yesterday at the Institute for Future Studies. The PowerPoint graphics were ugly, but the analysis was beautiful. In this case the researchers need a designer to make their findings understandable to the broad public, and together they could build an app that would predict unemployment month by month. The CDIAC publish CO2 data for the atmosphere and the ocean, and they publish national and global emissions data. The UNFCCC publish national greenhouse gas inventories. What are the key datasets that you’d like to get hold of that are currently hard to get, and who currently holds these? I have no coherent CO2 dataset for the world beyond 2008 at the present. I want to have this data until last year, at least. I would also welcome half year data but I understand this can be difficult because carbon dioxide emission vary for transport, heating or cooling of houses over the seasons of the year. So just give me the past year’s data in March. And in April/May for all countries in the world. Then we can hold government accountable for what happens year by year. Let me tell you a bit about what happens in Sweden. The National Natural Protection Agency gets the data from the Energy Department and from other public sources. Then they give these datasets to consultants at the University of Agriculture and the Meteorological Authority. Then the consultants work on these datasets for half a year. They compile them, the administrators look through them and they publish them in mid-December, when Swedes start to get obsessed about Christmas. So that means that there was a delay of eleven and a half months. So I started to criticize that. My cutting line was when I was with the Minister of Environment and she was going to Durban. And I said “But you are going to Durban with eleven and a half month constipation. What if all of this shit comes out on stage? That would be embarrassing wouldn’t it?”. Because I knew that she had in 2010 an increase in carbon dioxide emission and it increased by 10%. But she only published that coming back from Durban. So that became a political issue on TV. And then the government promised to make it earlier. So 2012 we got CO2 data by mid-October, and 2013 we’re going to get it in April. Fantastic. But actually ridiculing is the only way that worked. That’s how we liberated the World Bank’s data. I ridiculed the President of the World Bank at an international meeting. People were laughing. That became too much. The governments in the rich countries don’t want the world to see emissions per capita. They want to publish emissions per country. This is very convenient for Germany, UK, not to mention Denmark and Norway. Then they can say the big emission countries are China and India. It is so stupid to look at total emissions per country. This allows small countries to emit as much as they want because they are just not big enough to matter. Norway hasn’t reduced their emissions for the last forty years. Instead they spend their aid money to help Brazil to replant rainforest. At the same time Brazil lends 200 times more money to the United States of America to help them consume more and emit more carbon dioxide into the atmosphere. Just to put these numbers up makes a very strong case. But I need to have timely carbon dioxide emission data. But not even climate activists ask for this. Perhaps it is because they are not really governing countries. The right wing politicians need data on economic growth, the left wing need data on unemployment but the greens don’t yet seem to need data in the same way. As well as issues getting hold of data at a national level, are there international agencies that hold data that you can’t get hold? It is like a reflection. If you can’t get data from the countries for eleven and a half months, why the heck should the UN or the World Bank compile it faster? Think of your household. There are things you do daily, that you need swiftly. Breakfast for your kids. Then, you know, repainting the house. I didn’t do it last year, so why should I do it this year? It just becomes slow the whole system. If politicians are not in a hurry to get data for their own country, they are not in a hurry to compare their data to other countries. They just do not want this data to be seen during their election period. So really what you’re saying that you’d recommend is stronger political pressure through ridicule on different national agencies? Yes. Or sit outside and protest. Do a Greenpeace action on them. Can you think of datasets about carbon dioxide emissions which aren’t currently being collected, but which you think should be collected? Yes. In a very cunning way China, South Africa and Russia like to be placed in the developing world and they don’t publish CO2 data very rapidly because they know it will be turned against them in international negotiations. They are not in a hurry. The Kyoto Protocol at least made it compulsory for the richest countries to report their data because they had committed to decrease. But every country should do this. All should be able to know how much coal each country consumed, how much oil they consumed, etc and from that data have a calculation made on how much CO2 each country emitted last year. It is strange that the best country to do this – and it is painful for a Swede to accept this – is the United States. CDIAC. Federal Agencies in US are very good on data and they take on the whole world. CDIAC make estimates for the rest of the world. Another US agency I really like is the National Snow and Ice Data Centre in Denver, Colorado. Thay give us 24 hours updates on the polar sea ice area. That’s really useful. They are also highly professional. In the US the data producers are far away from political manipulation. When you see the use of fossil fuels in the world there is only one distinct dip. That dip could be attributed to the best environmental politician ever. The dip in CO2 emissions took place in 2008. George W. Bush, Greenspan and the Lehman Brothers decreased CO2 emissions by inducing a financial crisis. It was the most significant reduction on the use of fossil fuels in modern history. I say this to put things into proportion. So far it is only financial downturns that have had an effect on the emission of greenhouse gases. The whole of environmental policy hasn’t yet had any such dramatic effect. I checked this with Al Gore personally. I asked him “Can I make this joke? That Bush was better for the climate than you were?”. “Do that!”, he said, “You’re correct.” Once we show this data people can see that the economic downturn so far was the most forceful effect on CO2 emission. If you could have all of the CO2 and climate data in the world, what would you do with it? We’re going to make teaching materials for high schools and colleges. We will cover the main aspects of global change so that we produce a coherent data-driven worldview, which starts with population, and then covers money, energy, living standards, food, education, health, security, and a few other major aspects of human life. And for each dimension we will pick a few indicators. Instead of doing Gapminder World with the bubbles that can display hundreds of indicators we plan a few small apps where you get a selected few indicators but can drill down. Start with world, world regions, countries, subnational level, sometimes you split male and female, sometimes counties, sometimes you split income groups. And we’re trying to make this in a coherent graphic and color scheme, so that we really can convey an upgraded world view. Very very simple and beautiful but with very few jokes. Just straightforward understanding. And for climate impact we will relate to the economy. To relate to the number of people at different economic levels, how much energy they use and then drill down into the type of energy they use and how that energy source mix affects the carbon dioxide emissions. And make trends forward. We will rely on the official and most credible trend forecast for population, one, two or more for energy and economic trends etc. But we will not go into what needs to be done. Or how should it be achieved. We will stay away from politics. We will stay away from all data which is under debate. Just use data with good consensus, so that we create a basic worldview. Users can then benefit from an upgraded world view when thinking and debating about the future. That’s our idea. If we provide the very basic worldview, others will create more precise data in each area, and break it down into details. A group of people inspired by your talk in Helsinki are currently starting a working group dedicated to opening up and reusing CO2 data. What advice would you give them and what would you suggest that they focus on? Put me in contact with them! We can just go for one indicator: carbon dioxide emission per person per year. Swift reporting. Just that. Thank you very much Professor Rosling. Thank you.
If you want help to liberate, analyse or communicate carbon emissions data in your country, you can join the OKFN’s Open Sustainability Working Group.

Video: Julia Kloiber on Open Data

- October 3, 2012 in Ideas and musings, Interviews, OKF Germany, OKFest, Our Work

Here’s Julia Kloiber from OKFN-DE’s Stadt-Land-Code project, talking at the OKFest about the need for more citizen apps in Germany, the need for greater openness, and how to persuade companies to open up.

Building the Ecology of Libraries – An Interview with Brewster Kahle

- March 23, 2012 in Featured, Interviews, OKCon, Open GLAM

This interview is cross-posted here and on the Open GLAM blog. Kai Eckert (left) and Adrian Pohl interviewing Brewster Kahle at OKCon 2011 At OKCon 2011, we had the opportunity to interview Brewster Kahle who is a computer engineer, internet entrepreneur, activist, and digital librarian. He is the founder and director of the Internet Archive, a non-profit digital library with the stated mission of “universal access to all knowledge”. Besides the widely known “Wayback Machine“, where archived copies of most webpages can be accessed, the Internet Archive is very active in the digitization of books, as well, and provides with the “Open Library” a free catalog that aims to describe “every book ever published”. Kahle and his wife, Mary Austin, created the Kahle/Austin Foundation that supports the Internet Archive and other non-profit organizations. As open data enthusiasts from the library world, we were especially interested in how the activities of the Internet Archive relate to libraries. We wanted to know how its general approach and service could be useful for libraries in Europe. Brewster Kahle, what is the Internet Archive and what is your vision for its future? The idea is to build the library of Alexandria version 2. The idea of all books, all music, all video, all lectures, well: kind of everything, available to anybody, anywhere that would want to have access. Of course, it’s not gonna be done by one organisation, but we hope to play a role by helping move forward libraries, ourselves and making as much technology as required to be able to fulfil this goal. What are the obstacles preventing this happening in the moment? We see the world splitting in two parts: There are the hyper-property interests and then there are the hyper-open interests, and I’d say actually the hyper-open is much more successful, but it’s not slowing down those that want to clamp down, shut down, control. What are the challenges faced by the Internet Archive regarding the digitization of books? There are two big problems: there is going and building a digital collection, either by digitizing materials or buying electronic books. And the other is: how do you make this available, especially the in-copyright works? For digitizing books, it costs about 10 cents a page to do a beautiful rendition of a book. So, for approximately 30 dollars a book for 300 pages you can do a gorgeous job. Google does it much more quickly and it costs only about 5 dollars for each book. So it really is much less expensive in less quality, but they are able to do things at scale. We digitize about 1000 books every day in 23 scanning centers in six countries. We will set up scanning centers anywhere, or, if there are people that would like to staff the scanners themselves, we provide the scanners and all of the backend processing for free, until we run out of scanners and we’ve got a bunch of them. So we’re looking either for people that want to scan their own collections by providing there own labour or they can employ us to do it and all told it is 10 cent a page to complete. Also, part of building a collection is buying electronic books. But when I say buying, I really mean buying books, not licensing books, books that are yours forever. There are a growing number of publishers that will sell ebooks like music publishers now sell MP3s. That does not mean that we can post them on the internet for free for everybody, it means we can lend them, one person at a time. So if we have a hundred copies, then we can lend them out, it’s very much like normal book collections. This has a nice characteristic that it does not build monopolies. So instead of going in licensing collections that will go away if you stop licensing them, or they are built into one centralized collection like Elsevier, JSTOR, or Hathi Trust, the idea is to buy and own these materials. Then there is the rights issue on how to make them available. Out of copyright materials are easy, those should be available in bulk – they often aren’t, and for instance Google books are not, they are not allowed to be distributed that way. But open books and libraries and museums that care should not pursue those closed paths. For the in-copyright materials, what we have done is to work with libriaries. We are lending them, so that the visitors to our libraries can now access over 100.000 books that have been contributed by libraries that are participating. There are now over 1000 libraries in 6 countries that are putting books into this collection that is then available in all other libraries. And these are books from all through the 20th century. So, if there are any libraries that would like to join, all they need to do is contribute a book from the post 1923 book from their collection to be digitized and made available and the IP addresses for their library. Then we will go and turn those on, so that all of those users inside that library or those that are dialing in (for instance via VPN) can borrow any of the 100.000 books. Any patron can borrow five books at one time and the loan period is for two weeks. But there is only one book circulating at any one time. So it is very good for the long tail. We think that this lending library approach and especially the in-library lending library has been doing very well. We have universities all over the world, we have public libraries all over the world now participating in building a library by libraries for libraries. You already talked about cooperating with traditional libraries with in-copyright books. How is the cooperation between the Internet Archive – which itself seems to be a library – and traditional libraries in other fields, with respect to the digitization of public domain books for instance? The way the Internet Archive works with libraries all over the world is by helping them digitize their books, music, video, microfilm, microfiche very cost-effectively. So we just try to cover the costs – meaning we make no profit, we are a non-profit library – and give the full digital copy back to the libraries and also keep a copy to make available on the Internet for free public access. Another way that we work with libraries is through this lending program where libraries just have to donate at least one book – hopefully many more to come – to go and build this collection of in-copyright materials that can then be lent to patrons again that are in the participating libraries. So those are the two main ways that we work with libraries on books. We also work with libraries on building web collections, we work with a dozen or so of the national libraries and also archive it. We have our subscription based services for helping build web collections, but never do things in such a way that if they stop paying, they lose access to data. The idea is to only pay once and have forever. Are there already cooperations between European libraries and the Internet Archive, and are scanners already located in Europe which could be used for digitization projects in university or national libraries? Yes, we have scanners now in London and in Edinburgh – the Natural History Museum and the national library – where we are digitizing now. We would like to have more scanners in more places so that anybody that would be willing to staff one, keep it busy for 40 hours each week, we will provide all of the technology for free or we can go and cooperate and we can hire the people and operate these scanning centers. We find that scanning centers – i.e. 10 scanners – can scan 30.000 books in a year and it costs about 10 cents a page. It is very efficient and very high quality. This is including fold outs and difficult materials as well. And it is working right within the libraries, so the librarians have real access to how the whole process is going and what is digitized. It is really up to the libraries. The idea is to get the libraries online as quickly as possible and without the restrictions that come with corporations. A very important topic is also the digitization of very old materials, valuable materials, old prints, handwritten manuscripts and so on. How do you also deal with these materials? We have been dealing now with printed material that goes back 500 years with our scanners, sometimes under very special conditions. The 10 cents a page is basically for normal materials. When you are dealing with handwritten materials or large-format materials, it costs a little bit more, just because it takes more time. All it really is, we are dealing with the labour. We are efficient, but it still does cost to hire people to do a good job of this. We are digitizing archives, microfilm, microfiche, printed materials that are bound, unbound materials, moving images, 16 mm as well as 8 mm films, audio recordings, LPs. We can also deal with materials that have already been digitized from other projects to make unified collections. How do you integrate all these digitized objects and how do you deal with the specific formats that are used to represent and consume digital materials? We use the metadata that comes from the libraries themselves, so we attach MARC records that come from the libraries to make sure that there’s good metadata. As these books move through time from organization to organization, the metadata and the data stays together. We then take the books after we photograph them, run them through optical character recognition so they can be searched and move them into many different formats from PDF, deja-vu, daisy files for the blind and the dyslexic, mobi-files for Kindle users, we can also make it available in some of the digital rights management services that the publishers are currently using for their in-print books. So all of these technologies are streamlined because we have digitized well over one million books. These are all available and easily plugged together. In total there are now over two million books that are availble on the Internet Archive website to end users to use through the openlibrary.org website, where people can go and see, borrow and download all of these books. But If libraries want to go and add these to there own collections they are welcome to. So if there are 100.000 books that are in your particular language or your subject area that you would like to complete your collections with, let’s go and get each of these collections to be world class collections. We are not trying to build an empire, we are not trying to build one database, we want to build a library system that lives, thrives and also supports publishers and authors going forward. So there’s the Internet Archive and the Open Library. Can you make the distinction any clearer for those that don’t currently understand it? Archive.org is the website for all of our books, music, video and web pages. Openlibrary.org is a website really for books. The idea is to build an open catalog system: One webpage for every book ever published. And if it’s available online, then we point to it. If it’s available for sale in bookstores, we point to those. If it’s available in libraries for physical taking out, we point to that. Openlibrary.org is now used by over a 100,000 people a day to go and access books. Also, people have integrated it in their catalogs. That means, when people come to the library catalogs that we’re partnered with, they search their library catalog and pull down a list and either they’ve added the MARC records from the set into their catalog or, better yet, they just go and call an API such that there’s a little graphic that says: “You can get this online.” or: “This is available to be borrowed for free.” or: “It’s available for borrowing but it’s currently checked out.” So, those little lights turn on so that people can get to the book electronically right there from the catalog. We find that integration very helpful because we don’t want everyone coming to Open Library. They know how to use libraries, they use your library and your library catalog. Let’s just get them the world’s books. Right there and then. You mentioned Google Books. There are agreements between libraries and Google for digitizing materials. What are the benefits for libraries of choosing the Internet Archive over Google Books? Google offers to cover the costs of the labor to do the digitization, but the libraries that participated ended up paying a very large amount of money just trying to prepare the books and get them to Google. Often they spent more working with Google than they would have with the Internet Archive, and in the latter case they do not have restrictions on their materials. So Google digitizes even public domain materials and then puts restrictions on their re-use. Everybody that says that it is open has got to mean something bizarre by ‘open’. You can not go and take hundreds of these and move them to another server, it is against the law and Google will stop libraries that try to make these available to people and moving the books from place to place, so this is quite unfortunate. Is Google reusing Internet Archive books in Google Books? They are not, but Hathi Trust, the University of Michigan, is taking many of our books. Google is indexing them so that they are in their search engine. People at OKCon naturally are supporters of Open Content, Open Knowledge but many libraries don’t like their digitized material to be open. Even public domain books which are digitized are normally kept on the libraries’ websites and by contracts or even by copyfraud they say: “You can not do whatever you want with this.” What would you say to libraries to really open up their digitized content? There’s been a big change over the last four or five years on this. None of the libraries we work with – 200 libraries in a major way and now we have received books from over 500 libraries – have put any restrictions beyond the copyright of the original material. If there’s copyrighted material, then of course it has restrictions. But neither the libraries nor the Internet Archive are imposing new restrictions. You are right that there are some libraries that may not want to participate in this but this is what most libraries are doing – except for the Google libraries which are locking them up. Do all the libraries provide high resoultion scans or do some choose to only provide PDFs? All the high resolution, the original images, plus the cropped and descewed ones, all of these are publicly downloadable for free so that all analysis can be done. There’s now over one petabyte of book data that is available from the Internet Archive for free and active download. About 10 million books each month are being downloaded from the Internet Archive. We think this is quite good. We’d like to see even more use by building complete collections that go all the way to the current day. I’d say we are in pretty good shape on the public domain in the English language but other languages are still quite far behind. So we need to go and fill in better public domain collections. But I’d say a real rush right now is getting the newer books to our users and our patrons that are really turning to the internet for the information for their education. To be more concrete. For libraries that are thinking about digitizing their own collections: what exactly do you offer? Either write to robert@archive.org or myself: brewster@archive.org. We will offer digitization services at 10 cents a page. And if there’s enough to do, we’ll do it inside your library. If the library wants to staff their own scanner and start with one, then we can provide that scanner as long as it doesn’t cost us anything. Somebody has to go and cover the transportation and the set it up, these costs will be borne by the library. But then all of the backend processing or the OCR is provided for free. In the lending system it’s at least one book, a set of IP adresses, contact information and you’re done. No contracts, nothing. Ok. So that means you offer the scanner technology for free, you offer the knowledge about how to use it for free. Only these additional costs for transportation have to be taken by the libraries. With your experience in digitization projects, every library should – and can – contact you and you explain the process to the people, you say what you’re doing, you give your opinion on how you would do it and then, of course, the library can decide? Absolutely. We’ll provide all the help we can for free to help people through the process. We find that many people are confused and they’ve heard contradictory things. Have you ever tried a kind of crowdsourcing approach for library users to digitize books themselves, placing a scanner in the library and let the users do it. Or does it take to much education for handling the scanners? We find that it actually is quite difficult to go and digitize a book well, unfortunately. Though we have just hired Dan Reetz, who is the head of a Do-it-yourself bookscanner group. And we’re now starting to make Do-it-Yourself bookscanners that people can make themselves and the software automatically uploads to the Internet Archive. So we hope that there’s a great deal of uptake from smaller organizations or from indivudals. In Japan, for instance, many people scan books and we receive those. People upload maybe one or two hundred books to us a day. So, people are uploading things often from the Arab world. They are digitizing on their own and we want those as well. So, we can work with people if they have PDFs of scanned books or just sets of images from either past projects or current projects or if they want to get involved. There are many different ways we would love to help. Does the Internet Archive collaborate with Europeana in some way, for example for making material from the Internet Archive available in Europeana? We’ve met with some of the people from Europeana and I believe they have downloaded all of our metadata. All of our metadata is available for free. I’m encouraged by some of what I’ve seen from Europeana towards being a search engine. To the extent that they may grow into being the library for Europe I think this is not a good idea. I like to see many libraries, many publishers, many booksellers, many, many authors and everyone being a reader. What we want are many winners, we don’t want just one. So, Europeana to the extent that it’s just a metadata service, I think is a good project. You just mentioned the metadata. So everything that you have, not only the digitized versions of the books but also the enrichments, the metadata about it, the OCR result, for example, everything is free and open. So, if I would like to, I could take the whole stuff and put it on my own server, re-publish it in the way that I want? Yes, absolutely. All the metadata, the OCR files, the image files are all available. There are a lot of the challenges maintaining the files over time and we are committed to do this but we don’t want to be the only one. So the University of Toronto has taken back all of the 300,000 books that were digitized from their collections to put them on their servers and they’re now starting to look at other collections from other libraries to add those. As we move to digital libraries we don’t necessarily just need digital versions of the physical books we own, we want digital books that are of interest to our patrons. Yes, it is all available and it’s forming a new standard for openness. The MARC records you mentioned, they are of course also available. So it makes sense for a library to include not only their own books but every book in the Internet Archive in their own catalog. Because, in fact, it is available to all the patrons. So, you could think of it as a possession of every library in the world. Is that right? Yes, we see this as building the ecology of libraries. The really valuable thing about libraries, – yes: there are great collections – but the real value are the librarians, the experts, the people that know about the materials and can bring those to people in new and different ways. That’s the real value to our library system. So, let’s make sure, as we go through this digitization wave we don’t end up with just one library that kills off all of the others, which is a danger. Thank you for the interview.
Interviewers: Kai Eckert is computer scientist and vice head of the IT departement of the Mannheim University Library. He coordinates the linked open data activities and developed the linked data service of the library. He held various presentations, both national and international, about linked data and open data. Adrian Pohl has been working at the Cologne-based North Rhine-Westphalian Library Service Center (hbz) since 2008. His main focuses are Open Data, Linked Data and its conceptual, theoretical and legal implications. Since June 2010 Adrian has been coordinating the Open Knowledge Foundation’s Working Group on Open Bibliographic Data. Acknowledgements: The introductory questions were taken from a former interview on the same day, conducted by Lucy Chambers.