You are browsing the archive for Guo Xu.

The Benefits of Open Data (part II) – Impact on Economic Research

- October 23, 2012 in Access to Information, Featured, WG Economics

This blog is cross-posted from the OKFN’s Open Economics blog A couple of weeks ago, I wrote the first part of the three part series on Open Data in Economics. Drawing upon examples from top research that focused on how providing information and data can help increase the quality of public service provision, the article explored economic research on open data. In this second part, I would like to explore the impact of openness on economic research. We live in a data-driven age There used to be a time when data was costly: There was not much data around. Comparable GDP data, for example, has only been collected starting in the early mid 20th Century. Computing power was expensive and costly: Data and commands were stored on punch cards, and researchers only had limited hours to run their statistical analyses at the few computers available at hand.

Today, however, statistics and econometric analysis has arrived in every office: Open Data initiatives at the World Bank and governments have made it possible to download cross-country GDP and related data using a few mouse-clicks. The availability of open source statistical packages such as R allows virtually everyone to run quantitative analyses on their own laptops and computers. Consequently, the number of empirical papers have increased substantially. The left figure (taken from Espinosa et al. 2012) plots the number of econometric (statistical) outputs per article in a given year: Quantitative research has really taken off since the 1960s. Where researchers used datasets with a few dozens of observations, modern applied econometricians now often draw upon datasets boasting millions of detailed micro-level observations.

 Why we need open data and access

The main economic argument in favour of open data is gains from trade. These gains come in several dimensions: First, open data helps avoid redundancy. As a researcher, you may know there are often same basic procedures (such as cleaning datasets, merging datasets) that have been done thousands of times, by hundreds of different researchers. You may also have experienced the time wasted compiling a dataset someone else already put together, but was unwilling to share: Open data in these cases can save a lot of time, allowing you to build upon the work of others. By feeding your additions back to the ecosystem, you again ensure that others can build on your data work. Just like there is no need to re-invent the wheel several times, the sharing of data allows researchers to build on existing data work and devote valuable time to genuinely new research. Second, open data ensures the most efficient allocation of scarce resources – in this case datasets. Again, as a researcher, you may know that academics often treat their datasets as private gold mines. Indeed, entire research careers are often built on possessing a unique dataset. This hoarding often results in valuable data lying around on a forgotten harddisk, not fully used and ultimately wasted. What’s worse, the researcher – even though owning a unique dataset – may not be the most skilled to make full use of the dataset, while someone else may possess the necessary skills but not the data. Only recently, I had the opportunity to talk to a group of renown economists who – over the past decades – have compiled an incredibly rich dataset. During the conversation, it was mentioned that they themselves may have only exploited 10% of the data – and were urgently looking for fresh PhDs and talented researchers to unlock the full potential of the their data. But when data is open, there is no need to search, and data can be allocated to the most skilled researcher. Finally, and perhaps most importantly, open data – by increasing transparency – also fosters scientific rigour: When datasets and statistical procedures are made available to everyone, a curious undergraduate student may be able to replicate and possibly refute the results of a senior researcher. Indeed, journals are increasingly asking researchers to publish their datasets along with the paper. But while this is a great step forward, most journals still keep the actual publication closed, asking for horrendous subscription fees. For example, readers of my first post may have noticed that many of the research articles linked could not be downloaded without a subscription or university affiliation. Since dissemination, replication and falsification are key features of science, the role of both open data and open access become essential to knowledge generation. But there are of course challenges ahead: For example, while a wider access to data and statistical tools is a good thing, the ease of running regressions with a few mouse-clicks also results in a lot of mindless data mining and nonsensical econometric outputs. Quality control, hence, is and remains important. There are and in some cases also should be some barriers to data sharing. In some cases, researchers have invested a substantial time of their lives to construct their datasets, in which case it is understandable why some are uncomfortable to share their “baby” with just anyone. In addition, releasing (even anonymized) micro-level data often raises concerns of privacy protection. These issues – and existing solutions – will be discussed in the next post. Are you interested in participating in the activities of the Open Economics Working Group? Click here to get involved

The Benefits of Open Data – Evidence from Economic Research

- October 5, 2012 in Access to Information, WG Economics

This blog is cross-posted from the OKFN’s Open Economics blog Looking back on the Open Knowledge Festival 2012 in September, there’s an impression that openness is everywhere: There are working groups on Open Science and Open Linguistics, topic streams on Gender and Diversity in Openness, and events like Open Prom and Open Sauna. Open Knowledge and Open Data, it seems, are omnipresent. Looking beyond the Open Knowledge community, however, the situation is very different. In Economics, for example, not many know what “open data”, “open access” or “Open Economics” exactly mean. Indeed, not many even care. A common reaction is: “Yes, it sounds interesting and important, but does it really matter? And why should I care about it?” In this post, I would like to give some hard evidence on the positive role of opening up information has had in economics, and sketch ideas for how to involve economists – professional or in training – to bring ideas of openness into the mainstream. I’ll look at economic research on open data, the impact of open data on economic research, and challenges and ways forward.

The real world impacts of open information

Making information accessible to the public can improve public service delivery. In countries where corruption is pervasive, services and funds often do not reach the frontline provider. And even if services do reach the people, the quality of services provided is often shockingly poor: survey evidence from Bangladesh, Ecuador, India, Peru and Uganda found absence rates as high as 20% and 35% for school teachers and health workers. In many cases, staff are poorly trained. Releasing data on service delivery can help reduce corruption and improve public services. In Uganda, researchers provided information to parents by publishing funding data for a random subset of schools in local newspapers. In consequence, corruption decreased significantly, while schooling outcomes improved substantially. Similar evidence in health delivery and redistributive policies suggests that providing information can help the public to discipline public service providers, improving the quality of services. Information can also expose corrupt politicians. The Federal Government of Brazil, for example, began to select and audit municipalities at random, releasing audit reports to the media. Researchers found that the audit outcomes had a significant impact on the reelection probability of politicians: those exposed for corruption were punished at the ballots, and the impact was most pronounced in areas where the dissemination of information was favoured by local radio. A story from fishermen in South India provides another example of how information can improve market efficiency. Studying the adoption of mobile phones in Kerala, researchers have found convincing evidence that access to information through mobile phones helped fishermen sell their catch at the market where the price was highest (and fish most demanded). Instead of sailing to a port and simply hoping for a good price, fishermen were empowered by technology to make informed decisions on how to trade. Finally, the benefits of transparency are not only restricted to reducing corruption and lowering the cost of information. A comparative study finds that transparency – measured by accuracy and frequency of macroeconomic information released to the public – leads to lower borrowing costs in sovereign bond markets. Open data pays off in many ways, in many different contexts. These are just a few selective examples of how cutting-edge economic research has identified the benefits of openness in a diverse range of situations. The cases I presented are not based on correlations, but carefully established causal relationships, leaving little doubt – at least within the context studied – that information matters, big time. Perhaps most importantly, these cases have also shown that open data must be understood in a broad sense. These interventions do not take advantage of linked data, do not use CSVs that are shared through Facebook or Twitter – often, these interventions are simple solutions that ultimately help improve the everyday lives of the people.

Introducing the Open Knowledge Index

- August 26, 2011 in composite index, economics, Hackday / Code Sprint, Open Economics, open knowledge index, WG Economics, Working Groups

The following post is from Guo Xu, Coordinator of the Open Economics Working Group Despite the increasing efforts in opening data and making information and knowledge accessible to a greater audience, there has not been an explicit way to measure openess in knowledge creation and dissemination. This has made it very difficult to compare country performance as well as tracking one country’s progress over time. We at the Open Economics Working Group had a first attempt to create an “Open Knowledge Index” to fill this gap. Early this week during a virtual sprint, seven of our members worked together to create the conceptual framework, gather the data and construct a first version for the set of OECD + BRIC countries. Here are the (preliminary) results (a technical explanation of the construction is here): Not surprisingly, there is a high correlation between a country’s wealth and its rank in providing Open Knowledge (Iceland leads the list). But a large fraction of the variation in the Open Knowledge Index cannot be explained by wealth alone – a good example here is Estonia, still an emerging country but one with the highest internet penetration rates in the world. As this is only a first version, we would be happy for any comments and feedbacks you may have. We are also looking for more volunteers who might be interested in joining our project – this can be by helping to improve the conceptual part of the index, by gathering data or improving the visualization. If you are interested, please get in touch with our Working Group by signing up and writing to the mailing list.

Call for participation: Open Economics Working Group

- June 21, 2011 in Open Economics, WG Economics

The following post is by Guo Xu, Coordinator of the Open Economics Working Group and research associate at DIW Berlin. Help make economics more open! The Open Economics Working Group of the Open Knowledge Foundation is an informal, community-organized group working to ensure economics is built on sound, transparent foundations. We’re looking for people, especially students, to get involved with the working group and its projects. Get involved! The Open Economics WG is driven by the contributions of volunteers like you. We are not only looking for coders or economists but are open to all who are enthusiastic about data: If you would like to explore the different ways in which you can participate, please join our mailing list or contact us at We are hosting a Skype meetup to take stock and discuss future projects on the: 23rd of June, 7 pm GMT+1 (British Summer Time [!]) and would like to invite you to join in – please drop a mail to along your Skype ID so we can add you to the session.