You are browsing the archive for historical data.

What does the history of global trade look like? The collaborative database RICardo opens up trade data to shed light on this question

- February 21, 2018 in Digital Humanities, economics, historical data, Open Data, Open Humanities

RICardo (Research on International Commerce) is a project dedicated to trade between nations over a period spanning the beginning of the Industrial Revolution to the eve of the Second World War. It combines a historical trade database covering all of the world’s countries and a website which invites to an exploration of the history of international trade through data visualizations. The project has recently released a web application and accompanying dataset, which is freely available under the Open Database License. In this blogpost, Beatrice Dedinger (economic historian) and Paul Girard (IT engineer) illustrate its’ use cases and background. The new RICardo web tool has been officially released in December 2017, on the occasion of the bicentenary of David Ricardo’s famous work, On the Principles of Political Economy and Taxation. It is the achievement of an experiment to combine economic history with digital humanities. The RICardo project is devoted to bilateral and total trade of all the world’s countries over a period spanning from the beginning of the 19th century to 1938. Bilateral trade means the distribution of trade of a country by partners, on the export and the import side. Total trade is the sum of all bilateral flows. Notice that RICardo focuses on trade by countries; it does not provide statistics of trade by products and thus, does not allow for the analysis of trade specialization or comparative advantages. We purposefully assembled data from the 19th and early 20th century, as this database never existed as such before. Governments did not start to publish printed documents of official trade statistics before the end of the Napoleonic wars. This is mostly true for the European states but also for other areas in the world that were under European influence. Since the end of the Second World War, the International Monetary Fund is in charge of gathering bilateral trade statistics of all countries; they are now freely available on The RICardo database includes around 300.000 data points (December 2017 version) that have been collected by hand from archives found in French or foreign libraries. This is currently the most exhaustive trade database dedicated to historical bilateral trade statistics. Original data (trade flows, names of countries) being in different currencies and languages, they have been converted into a usable format by creating a relational database. The entire RICardo dataset is now freely available under the Open Database License in our versioned data repository described under the Data Package format.

Source: Estadística Comercial de la Republica de Chile (1845)

Source: RICardo_flows database

RICardo is meant for studying and discovering the history of trade and trade globalization. How did countries become economically interdependent? How did the trade volume and variety of exchanges of goods and services develop across nations? Trade databases are needed to address these and similar questions. As an example, economic historians, relying on limited trade datasets, have first demonstrated that a “First” globalization occurred over 1870-1914. When they were afforded with extended trade databases, they challenged this conclusion to now affirm that trade globalization started around the 1840s. But RICardo also allows for the study of neglected areas of the history of trade, largely because of the lack of data. It can help to explore the history of geopolitical trade relationships. If you are interested, for example, in the trade history of Chile over the 19th-mid20th century, RICardo provides you with visualizations and a dataset to describe Chile’s relationships with all its partners over the period of your choice. RICardo offers the opportunity to discover the history of international trade not only through aggregate world trade curves but also by looking at the details of bilateral trade flows: visual exploration is key to handle the complexity of trade data by switching from an aggregate to a detailed level, or from one country to another. To do so, the tool uses  a method developed at Sciences Po médialab called “datascape”. By considering data visualization from the very beginning, the research team can gain creative constraints that help to better design the dataset. Alternatively, data visualizations are a very efficient way to take care of the data, in particular, to check data integrity. This project was very enriching on a personal level in that it taught us to work in a new way. At the beginning, in 2004, the project was launched by a team of researchers at Sciences Po Paris working on financial and trade history and needing historical trade datasets to perform a research idea. It was (still is) usual that each researcher builds by him/herself a trade database for the needs of personal research, ever trying to do better than the other. This way of working points to a competitive state of mind from which we moved away. During more than ten years of work, we have faced a lot of problems that eventually led us to work in a more collaborative, creative, and challenging way. This was the driving force in the achievements of RICardo. That is why we are keen to open our data to everyone, to share the results of our work with the widest audience, to open it to contributions, to foster its usage by the community, and to arouse the curiosity of the public about a subject a priori austere but that we try to address in an enjoyable way.

BudgetApps: The First All-Russia Contest on Open Finance Data

- January 16, 2015 in Budget Data, historical data, OKF Russia, Open Data

This is a guest post by Ivan Begtin, Ambassador for Open Knowledge in Russia and co-founder of the Russian Local Group. budgetapps2 Dear friends, the end of 2014 and the beginning of 2015 have been marked by an event, which is terrific for all those who are interested in working with open data, participating in challenges for apps developers and generally for all people who are into the Open Data Movement. I’m also sure, by the way, that people who are fond of history will find it particularly fascinating to be involved in this event. On 23 December 2014, the Russian Ministry of Finance together with NGO Infoculture launched an apps developers’ challenge BudgetApps based on the open data, which have been published by the Ministry of Finance over the past several years. There is a number of various datasets, including budget data, audit organisations registries, public debt, national reserve and many other kinds of data. Now, it happened so that I have joined the jury. So I won’t be able to participate, but let me provide some details regarding this initiative. All the published data can be found at the Ministry website. Lots of budget datasets are also available at The Single Web Portal of the Russian Federation Budget System. That includes the budget structure in CSV format, the data itself, reference books and many other instructive details. Data regarding all official institutions are placed here. This resource is particularly interesting, because it contains indicators, budgets, statutes and numerous other characteristics regarding each state organisation or municipal institution in Russia. Such data would be invaluable for anyone who considers creating a regional data-based project. One of the challenge requirements is that the submitted projects should be based on the data published by the Ministry of Finance. However, it does not mean that participants cannot use data from other sources alongside with the Ministry data. It is actually expected that the apps developers will combine several data sources in their projects. To my mind, one should not even restrict themselves to machine-readable data, because there are also available human-readable data that can be converted to open data formats by participants. Many potential participants know how to write parsers on their own. For those who have never had such an experience there are great reference resources, e.g. ScraperWiki that can be helpful for scraping web pages. There are also various libraries for analysing Excel files or extracting spreadsheets from PDF documents (for instance, PDFtables, Abbyy Finereader software or other Abbyy services ). Moreover, at other web resources of the Ministry of Finance there is a lot of interesting information that can be converted to data, including news items that recently have become especially relevant for the Russian audience.

Historical budgets

There is a huge and powerful direction in the general process of opening data, which has long been missing in Russia. What I mean here is publishing open historical data that are kept in archives as large paper volumes of reference books containing myriads of tables with data. These are virtually necessary when we turn to history referring to facts and creating projects devoted to a certain event. The time has come at last. Any day now the first scanned budgets of the Russian Empire and the Soviet Union will be openly published. A bit later, but also in the near future, the rest of the existing budgets of the Russian Empire, the Soviet Union, and the Russian Soviet Federated Socialist Republic will be published as well. These scanned copies are being gradually converted to machine-readable formats, such as Excel and CSV data reconstructed from these reference books – both as raw data and as initially processed and ordered data. We created these ordered normalised versions to make it easier for developers to use them in further visualisations and projects. A number of such datasets have already been openly published. It is also worth mentioning that a considerable number of scanned copies of budget reference books (from both the Russian Empire and USSR) have already been published online by Historical Materials, a Russian-language grass-root project launched by a group of statisticians, historians and other enthusiasts. Here are the historical machine-readable datasets published so far: I find this part of the challenge particularly inspiring. If I were not part of the jury, I would create my own project based on historical budgets data. Actually, I may well do something like that after the challenge is over (unless somebody does it earlier).

More data?

There is a greater stock of data sources that might be used alongside with the Ministry data. Here are some of them: These are just a few examples of numerous available data sources. I know that many people also use data from Wikipedia and DBPedia.

What can be done?

First and foremost, there are great opportunities for creating projects aimed at enhancing the understandability of public finance. Among all, these could be visual demos of how the budget (or public debt, or some particular area of finance) is structured. Second, lots of projects could be launched based on the data on official institutions at For instance, it could be a comparative registry of all hospitals in Russia. Or a project comparing all state universities. Or a map of available public services. Or a visualisation of budgets of Moscow State University (or any other Russian state university for that matter). As to the historical data, for starters it could be a simple visualisation comparing the current situation to the past. This might be a challenging and fascinating problem to solve.

Why is this important?

BudgetApps is a great way of promoting open data among apps developers, as well as data journalists. There are good reasons for participating. First off, there are many sources of data that provide a good opportunity for talented and creative developers to implement their ambitious ideas. Second, the winners will receive considerable cash prizes. And last, but not least, the most interesting and perspective projects will get a reference at the Ministry of Finance website, which is a good promotion for any worthy project. Considerable amounts of data have become available. It’s time now for a wider audience to become aware of what they are good for.

The Statistical Memory of Brazil

- January 14, 2013 in Crowd Sourcing, data digitalisation, Data Digitalization, data mining, data systems, economics profession, External Projects, Featured, historical data, Open Data, Open Economics, Public Finance and Government Data, Statistical Memory of Brazil

This blog post is written by Eustáquio Reis, Senior Research Economist at the Institute of Applied Economic Research (Ipea) in Brazil and member of the Advisory Panel of the Open Economics Working Group. The project Statistical Memory of Brazil aims to digitize and to make freely available and downloadable the rare book collections of the Library of the Minister of Finance in Rio de Janeiro (BMF/RJ). The project focuses on the publications containing social, demographic, economic and financial statistics for the nineteenth and early twentieth century Brazil. At present, approximately 1,500 volumes, 400,000 pages and 200,000 tables have been republished. Apart from democratizing the contents to both the scientific community and the general public, the project intends the physical preservation of the collection. The rarity, age and precarious state of conservation of the books strongly recommend to restrict physical access to them, limiting their handling to specific bibliographical purposes. For the Brazilian citizen, free access to the contents of rare historical collections and statistics provides a form of virtual appropriation of the national memory, and as such a source of knowledge, gratification and cultural identity.

The Library of the Minister of Finance in Rio de Janeiro (BMF/RJ)

Inaugurated in 1944, the BMF/RJ extends over 1,200 square meters in the Palacio da Fazenda in downtown Rio de Janeiro, the seat of the Minister of Finance up to 1972 when it was moved to Brasilia. The historical book collection dates back to the early 19th century when the Portuguese Colonial Administration was transferred to Brazil. Thereafter, several libraries from other institutions — Brazilian Customs, Brazilian Institute of Coffee, Sugar and Alcohol Institute, among others — were incorporated to the collection which today comprises over 150,000 volumes mainly specialized in economics, law, public administration and finance.

Rare book collections

For the purposes of the project, the collection of rare books includes a few thousand statistical reports and yearbooks. To mention just a few, the annual budgets of the Brazilian Empire, 1821-1889; annual budgets of the Brazilian Republic since 1890; Ministerial and Provincial reports since the 1830s; foreign and domestic trade yearbooks since 1839; railways statistics since the 1860s; stock market reports since the 1890s; economic retrospects and financial newsletters since the 1870s; the Brazilian Demographic and Economic Censuses starting in 1872 as well as the Brazilian Statistical Yearbooks starting in 1908. En passant, it should be noted that despite their rarity, fragility, and scientific value, these collections are hardly considered for republication in printed format.

Partnerships and collaborations

Under the initiative of the Research Network on Spatial Analysis and Models (Nemesis), sponsored by the Foundation for the Support of Research of the State of Rio de Janeiro and the National Council for Scientific and Technological Development, the project is a partnership between the Regional Administration of the Minister of Finance in Rio de Janeiro (MF/GRA-RJ); Institute of Applied Economic Researh (IPEA) and the Internet Archive (IA). In addition to the generous access to its library book collection, The Minister of Finance provides the expert advice on their librarians as well as the office space and facilities required for the operation of the project. The Institute of Applied Economic Research provides advisory in economics, history and informatics. The Internet Archive provides the Scribe® workstations and digitization technology, making the digital publications available in several different formats on the website. The project also makes specific collaborations with other institutions to supplement the collections of the Library of the Minister of Finance. Thus, the Brazilian Statistical Office (IBGE) supplemented the collections of the Brazilian Demographic and Economic Censuses, as well as of the Brazilian Statistical Yearbooks; the National Library (BN) made possible the republication of the Budgets of the Brazilian Empire; the Provincial and Ministerial Reports; the Rio News; and the Willeman Brazilian Review, the latter in collaboration with and the Department of Economics of the Catholic University of Rio de Janeiro.

Future developments an extensions

Based upon open source software designed to publish, manage, link and preserve digital contents (Drupal, Fedora and Islandora), a new webpage of the project is under construction including two collaborative / crowdsourcing platforms. The first crowdsourcing platform will create facilities for the indexing, documentation and uploading of images and tabulations of historical documents and databases compiled by other research institutions or individuals willing to make voluntary contributions to the project. The dissemination of the digital content intends to stimulate research innovations, extensions, and synergies based upon the historical documents and databases. For such purpose, an open source solution to be considered is the Harvard University Dataverse Project. The second crowdsourcing platform intends to foster online decentralized collaboration of volunteers to compile or transcribe to editable formats (csv, txt, xls, etc.) the content of selected digital republications of the Brazil’s Statistical Memory project. Whenever possible, optical character recognition (OCR) programs and routines will be used to facilitate the transcription of the image content of the books. The irregular typography of older publications, however, will probably require visual character recognition and manual transcription of contents. Finally, additional routines and programs will be developed to coordinate, monitor and revise the compilations made, so as to avoid mistakes and duplications.

Project Team

Eustáquio Reis, IPEA, Coordinator
Kátia Oliveira, BMF/RJ, Librarian
Vera Guilhon, BMF/RJ, Librarian
Jorge Morandi, IPEA, TI Coordinator
Gemma Waterston, IA, Project Manager
Ana Kreter, Nemesis, Researcher
Gabriela Carvalho, FGV, Researcher
Lucas Mation, IPEA, Researcher

Fábio Baptista
Anna Vasconcellos
Ana Luiza Freitas
Amanda Légora