You are browsing the archive for Crowd Sourcing.

The Statistical Memory of Brazil

- January 14, 2013 in Crowd Sourcing, data digitalisation, Data Digitalization, data mining, data systems, economics profession, External Projects, Featured, historical data, Open Data, Open Economics, Public Finance and Government Data, Statistical Memory of Brazil

This blog post is written by Eustáquio Reis, Senior Research Economist at the Institute of Applied Economic Research (Ipea) in Brazil and member of the Advisory Panel of the Open Economics Working Group. The project Statistical Memory of Brazil aims to digitize and to make freely available and downloadable the rare book collections of the Library of the Minister of Finance in Rio de Janeiro (BMF/RJ). The project focuses on the publications containing social, demographic, economic and financial statistics for the nineteenth and early twentieth century Brazil. At present, approximately 1,500 volumes, 400,000 pages and 200,000 tables have been republished. Apart from democratizing the contents to both the scientific community and the general public, the project intends the physical preservation of the collection. The rarity, age and precarious state of conservation of the books strongly recommend to restrict physical access to them, limiting their handling to specific bibliographical purposes. For the Brazilian citizen, free access to the contents of rare historical collections and statistics provides a form of virtual appropriation of the national memory, and as such a source of knowledge, gratification and cultural identity.

The Library of the Minister of Finance in Rio de Janeiro (BMF/RJ)

Inaugurated in 1944, the BMF/RJ extends over 1,200 square meters in the Palacio da Fazenda in downtown Rio de Janeiro, the seat of the Minister of Finance up to 1972 when it was moved to Brasilia. The historical book collection dates back to the early 19th century when the Portuguese Colonial Administration was transferred to Brazil. Thereafter, several libraries from other institutions — Brazilian Customs, Brazilian Institute of Coffee, Sugar and Alcohol Institute, among others — were incorporated to the collection which today comprises over 150,000 volumes mainly specialized in economics, law, public administration and finance.

Rare book collections

For the purposes of the project, the collection of rare books includes a few thousand statistical reports and yearbooks. To mention just a few, the annual budgets of the Brazilian Empire, 1821-1889; annual budgets of the Brazilian Republic since 1890; Ministerial and Provincial reports since the 1830s; foreign and domestic trade yearbooks since 1839; railways statistics since the 1860s; stock market reports since the 1890s; economic retrospects and financial newsletters since the 1870s; the Brazilian Demographic and Economic Censuses starting in 1872 as well as the Brazilian Statistical Yearbooks starting in 1908. En passant, it should be noted that despite their rarity, fragility, and scientific value, these collections are hardly considered for republication in printed format.

Partnerships and collaborations

Under the initiative of the Research Network on Spatial Analysis and Models (Nemesis), sponsored by the Foundation for the Support of Research of the State of Rio de Janeiro and the National Council for Scientific and Technological Development, the project is a partnership between the Regional Administration of the Minister of Finance in Rio de Janeiro (MF/GRA-RJ); Institute of Applied Economic Researh (IPEA) and the Internet Archive (IA). In addition to the generous access to its library book collection, The Minister of Finance provides the expert advice on their librarians as well as the office space and facilities required for the operation of the project. The Institute of Applied Economic Research provides advisory in economics, history and informatics. The Internet Archive provides the Scribe® workstations and digitization technology, making the digital publications available in several different formats on the website. The project also makes specific collaborations with other institutions to supplement the collections of the Library of the Minister of Finance. Thus, the Brazilian Statistical Office (IBGE) supplemented the collections of the Brazilian Demographic and Economic Censuses, as well as of the Brazilian Statistical Yearbooks; the National Library (BN) made possible the republication of the Budgets of the Brazilian Empire; the Provincial and Ministerial Reports; the Rio News; and the Willeman Brazilian Review, the latter in collaboration with and the Department of Economics of the Catholic University of Rio de Janeiro.

Future developments an extensions

Based upon open source software designed to publish, manage, link and preserve digital contents (Drupal, Fedora and Islandora), a new webpage of the project is under construction including two collaborative / crowdsourcing platforms. The first crowdsourcing platform will create facilities for the indexing, documentation and uploading of images and tabulations of historical documents and databases compiled by other research institutions or individuals willing to make voluntary contributions to the project. The dissemination of the digital content intends to stimulate research innovations, extensions, and synergies based upon the historical documents and databases. For such purpose, an open source solution to be considered is the Harvard University Dataverse Project. The second crowdsourcing platform intends to foster online decentralized collaboration of volunteers to compile or transcribe to editable formats (csv, txt, xls, etc.) the content of selected digital republications of the Brazil’s Statistical Memory project. Whenever possible, optical character recognition (OCR) programs and routines will be used to facilitate the transcription of the image content of the books. The irregular typography of older publications, however, will probably require visual character recognition and manual transcription of contents. Finally, additional routines and programs will be developed to coordinate, monitor and revise the compilations made, so as to avoid mistakes and duplications.

Project Team

Eustáquio Reis, IPEA, Coordinator
Kátia Oliveira, BMF/RJ, Librarian
Vera Guilhon, BMF/RJ, Librarian
Jorge Morandi, IPEA, TI Coordinator
Gemma Waterston, IA, Project Manager
Ana Kreter, Nemesis, Researcher
Gabriela Carvalho, FGV, Researcher
Lucas Mation, IPEA, Researcher

Fábio Baptista
Anna Vasconcellos
Ana Luiza Freitas
Amanda Légora

Timeline of Failed European Banks

- January 7, 2013 in banks, Crowd Sourcing, data visualisation, Failed Banks, Featured, financial markets, Open Data, Public Finance and Government Data

A few months back Open Economics launched a project to list the European banks which have failed recently. After a successful online data sprint and follow up research, we have now collected data on 122 bank failures and bailouts since 1997. To visualize the data collected on bank failures I created this timeline. The data collection was initiated as neither the EU Commission, Eurostat nor EBA were able to provide any specific data. We decided to include a broad range of bank crisis measures beyond bankruptcy filing such as bank nationalisations and government bailouts. We also added some bank mergers,and finally we have added several cases where banks entered temporary closure (ie. “extraordinary administration” under Italian law). For each failed bank we have attempted to gather basic details such as the date of collapse, a news source and a news clip explaining the circumstances of the collapse. We need your help to improve the failed bank tracker? Here’s how you can help.
  • Bank failures are still missing from the list. So if you know of any failures missing from the list, please go ahead and add the information directly in the sheet. If you have corrections to any of the bank appearing, please add them with an attached source and information. If news clips are not available in English, add information in the original language.
  • Descriptions and sources for several of the banks on the list are still missing – in particular on Italian and Portuguese.
  • Additional info. We hope to add more data to each bank failure, in particular a) The total assets prior to collapse and b) The auditor who signed off on the latest annual report. Let us know if you wish to help digging up any of this information.
  • We are eager to hear your view on the approach or any of the listed bank failures. Join the discussion on our mailing-list.

Economics & Coordinating the Crowd

- December 20, 2012 in Crowd Sourcing, crowd-funding, Featured, Open Innovation, Open Research

This blog post is written by Ayeh Bandeh-Ahmadi, PhD candidate at the Department of Economics, University of Maryland.

Group designed by Amar Chadgar from the Noun Project

This past spring, I spent a few months at the crowdfunding company Kickstarter, studying a number of aspects of the firm from what makes some projects succeed while others fail, preferences among backers, predictors of fraud, and market differences across geography and categories. I uncovered some fascinating tidbits through my research, but what stands out the most is just how much more challenging it is to run an effective crowdfunding service than you might think. For everything that has been written about crowdfunding’s great promise (Tim O’Reilly tweeted back in February “Seems to me that Kickstarter is the most important tech company since facebook. Maybe more important in the long run.”), its ability to deliver on fantastic and heretofore unachievable outcomes ultimately hinges on getting communities of people onto the same page about each other’s goals and expectations. In that regard, crowdfunding is all about overcoming a longstanding information problem, just like any other crowdguided system, and it offers some great lessons about both existing and missing tools for yielding better outcomes from crowdsourced science to the development of open knowledge repositories. What is both compelling and defining amongst crowdguided systems — from prediction markets, the question and answer site Quora, to crowdsourced science and funding platforms like Kickstarter, MedStartr and IndieGogo — is their ability to coordinate improvements in social welfare that were practically impossible before. The idea is that if we could combine efforts with the right collection of other individuals who have compatible goals and access to complimentary resources to ours, then we could achieve outcomes that previously or on our own might be impossible. In the case of crowdfunding, these resources might be largely financial, whereas in the case of crowdsourcing, they might involve time and other resources like computing power and expertise. In both cases, the promise of crowdguided approaches are their ability to arrive at pareto-improvements to outcomes (economists’ way of describing scenarios where some are better off but no one is worse off). Achieving those outcome improvements that were impossible under traditional institutions also requires coordination mechanisms that improve bandwidth for processing information, incentives, preferences, and resources across the community. Crowdguided systems often improve coordination by providing:
  • opportunities for identifying meaningful problems with particularly high value to the community. Identifying communal values helps develop clearer definitions of relevant communities and important metrics for evaluating progress towards goals.
  • opportunities for individuals to learn from others’ knowledge and experience. Under the right conditions, this can lead to more information and wisdom than any few individuals could collectively arrive at.
  • opportunities for whole communities to coordinate allocation of effort, financing and other resources to maximize collective outcomes. Coordinating each person’s contribution can result in achieving the same or better outcomes with less duplication of effort.
There are some great lessons to take from crowdfunding when it comes to building community, thinking about coordination mechanisms, and designing better tools for sharing information. A major part of Kickstarter’s success comes from its founders’ ability to bring together the creative community they have long been members of around projects the community particularly values. Despite the fact that technology projects like the Pebble watch and Ouya videogame controller receive a great deal of press and typically the largest funding, they still account for a smaller fraction of funding and backings than music or film, in large part a reflection of the site’s strength in its core creative community. It helps that projects that draw from a likeminded community have a built-in sense of trust, reputation and respect. Kickstarter further accomplishes a sense of community amongst backers of each project through facilitating meaningful rewards. By offering to share credit, methodology, the final product itself, and/or opportunities to weigh in on the design and execution of a project, the most thoughtful project creators help to align backers’ incentives with their own. In the case of crowdfunding, this often means incentivizing backers to spread the word via compelling calls to their own social networks. In the case of crowdsourcing science, getting the word out to other qualified networks of researchers is often equally important. Depending on the project, it may also be worth considering whether skewed participation could bias results. Likewise, the incentive structures facilitated through different credit-sharing mechanisms and opportunities for individuals to contribute to crowdsourced efforts in bigger, different ways are quite relevant to consider and worth economic investigation. I often hear from backers that the commitment mechanism is what compels them to back crowdfunding projects they otherwise wouldn’t. The possibility of making each individual’s contribution to the collective effort contingent on the group’s collective behavior is key to facilitating productive commitments from the crowd that were previously not achievable. Economists would be first to point out the clear moral hazard problem that exists in the absence of such a mechanism: if everyone suspects that everyone (or no one) else will already fund a project to their desired level, then no one will give to it. There is an analogous problem when it comes to crowdsourcing science in that each potential contributor needs to feel that their actions make a difference in personal or collective outcomes that they care about. Accordingly, it is important to understand what drives individuals to contribute — and this will certainly vary across different communities and types of project — in order to articulate and improve transparent incentive systems tailored to each. Finally, while crowdfunding projects focused on delivering technology often garner the most press, they also present some of the greatest challenges for these platforms. Technology projects face the greatest risks in part simply because developing technologies, like delivering scientific findings, can be especially risky. To aggravate matters further, individuals drawn to participating in these projects may have quite different personal incentives than those designing them. When it comes to especially risky science and technology projects, in crowdfunding as in crowdsourcing, the value of good citizen-input is especially high but the noise and potential for bias are likewise high as well. Finding ways to improve the community’s bandwidth for sharing and processing its collective wisdom, observations and preferences is, in my opinion, quite key to achieving greater innovation in crowdguided platforms. Luckily, economists have done quite a bit of work on design of prediction markets and other mechanisms for extracting information in noisy environments and on reputation mechanisms that could and perhaps ought to be extended to thinking about these problems. Next time, I’ll summarize some of the key findings from this research and areas where it could be better targeted to the design of crowdguided systems.

How to study lobbying with crowdsourced open data

- May 10, 2011 in Crowd Sourcing, External, Featured Project, France, Government, Guest post, Lobbying, Open Data, Open Government Data, Regards Citoyens, Transparency International, WG EU Open Data, WG Open Government Data, Working Groups

The following guest post is from Regards Citoyens, a French organisation that promotes open data. For about a year, Regards Citoyens has been working together with the French chapter of Transparency International in order to bring more transparency in the processes of influence and lobbying within the French parliament. Lobbying is a very controversial subject in France: we discuss it a lot, but we do not know much about it. So we decided to try and study the visible part of this mysterious iceberg by bringing out some new data to the public debate. On a regular basis, MPs publish official reports regarding the preparation of their legislative and government evaluation work. It makes sense that they would listen to anyone concerned with the current topic during this process. But is this done in a fair, plural and transparent way? Are corporations and unions listened on an equal footing? What about NGOs and other actors from the civil society? Much like the European Parliament did, the French Assembly recently created an official register of lobbyists who get granted access to the hallways. But it turns out that this register does not contain more than a hundred names.
Official MPs reports
A few official reports from MPs
We decided to take a closer look and try to get a more complete list by browsing through all the 1,174 reports published between July 2007 and July 2010. Indeed, some of them propose an appendix with a list of all the hearings organised during the preparation of the report. Unfortunately, we quickly discovered that most reports do not feature such a list: using text analysis tools, we found them in only 38 % of the reports. Even this small visible part of influence seriously lacks in transparency. But that already provided us with an important dataset of 16,000 names, much more than the few officially registered lobbyists. Our main concern then was to identify each organisation behind all of these names. Doing so was sometimes easy (mentioned along the name in the appendix), sometimes a bit harder (requiring to read pieces of the report, for instance). So we decided to develop a crowdsourcing tool allowing anyone to participate. An application available under a free licence, the AGPL, was built to process each name one by one, at least by three different users to validate the data. The idea was to make anyone able to easily contribute for just a few minutes, without having to register. Registration was only needed to participate in the top 50 contributors ladder. The simplicity and dynamicity of the Ajax-based interface (fields pre-filled and reports pre-loaded and scrolled), the fun of discovering lobbyists while “digitizing them” and the competitive aspect, provided by the ladder, certainly helped a lot: in a couple days a good buzz started, and while we expected the crowdsourcing to take a couple months, everything was achieved in only 10 days thanks to more than 3,000 citizens! This cool process brought us a database of 16,000 hearings with names, sex, functions and organisations of each one of the lobbyists. After some brief discussions with the national Assembly and the CNIL (French commission for privacy rights), we decided to release only the names of the organisations and not those of the people. Even though they are already public, coming from official reports, these institutions were unable to find an agreement on whether the names of lobbyists were public or private information. In the end, we decided to anonymise the data and make sure no illegal database of religious or union affiliation could be published out of it. Using Freebase GridWorks, we finally refined the data and consolidated it into 9,300 grouped hearings of organizations, which were associated to the theme subjects of each report. But to be able to draw trends, we needed to categorize these organizations by interests: unions, corporations, individuals, religious organisations, think-tanks, NGO’s and associations. We first used the EU registry, but the large number of organisations we needed to classify quickly revealed the limits of the commission’s categories, especially regarding the public sector organisations. So we decided to improve it and build progressively our own categorization of interest representatives (fr) while categorizing gradually the data. Holding all of these enriched data, TI started browsing it and drafted an insightful study based on the results (fr). At the same time, we worked on developping a visualisation in order to present the data in ways people could easily understand and browse. Inspired by WhereDoesMyMoneyGo‘s first design, we used the powerful Raphael JavaScript library to put out in a couple weeks a fully accessible application allowing to browse by themes and in subdetails all of these information. But what did we learn? First, that on all subjects, there were considerably fewer hearings with women (24 %), with the only exception of the reports regarding… gender issues, of course! Also, the study reveals that MPs listen mainly during their hearings to administrations and organisations from the public sector (48 %). Trade unions and other professionnal organisations come then, followed at the 3rd place by private companies. NGOs and civil society organisations lobby in only 7 % cases. But the most interesting conclusion probably comes from the comparison of the categories for each specific theme. We can observe that companies are more often listened on topics like economy, energy, environment and probably more suprisingly on transportations, culture or digital issues. On the other hand, civil society organisations are more presents in topics like development aid or veterans. All of these results concern of course only the visible part of the lobbying, but taking a close look at the holes (like the surprisingly low number of hearings for private companies on health issues) provides interesting insights and validates our conclusion: transparency in France is definitely lacking in this area! Of course, all of the anonymised data that was generated for this study is republished as open data under the ODBL licence and freely reusable. We completed the data with extra information such as the authors and political groups of the reports and such. This means there are certainly plenty other possible uses to these data! We’re convinced making it open data can only bring more great projects! Read the study and browse the visualisation online. Related posts:
  1. Eduserv study on open content licensing in cultural heritage sector published
  2. Thoughts from the GLA’s Possibilities of Real Time Data conference
  3. Study on use of open licenses by UK cultural heritage organisations