You are browsing the archive for Open Spending.

Elvis maps your tenders

- October 24, 2018 in Open Contracting, Open Spending

Ever heard of public procurement? Public tenders? Public spending? It is what your government does with your hard earned taxes: hires companies to do things. In return for your taxes, companies build roads, buildings, deliver cozy office chairs for the ministries, or take care of catering for public schools. A lot of data on public spending is open. But the government also spends A LOT of money. To explore the public spending of France for example, you need to go through roughly 2 million tenders. That is a certainly a lot to handle for your spreadsheet application. Moreover: tendering is a quite technical matter and so there is a lot to take into account: buyer and seller details, tender procedures, award criteria, sector codes, competition, … the list goes on. Spreadsheets of dozens of columns can (and will) get confusing fast. To bridge the gap that might stand between a curious journalist and hordes of data, we have created Elvis (map me tender), a tool that makes it easy to see who the government hires to do things and how much money they give them.

What can I do with Elvis?

In Elvis, you draw networks (or graphs) of public spending data of a specific company or country, sector, time range. We try to keep the choices finite and as clear as possible.

Choices are finite

You can visualize sectors in a country, such as IT spending or health care spending , or choose companies from the menu and see in which countries and sectors they are active.

Based on your choices, Elvis will draw a network.

This is a network graph of money flows between governments and companies. It wiggles too!

The network comes with a sidebar which lists of all the companies and governments in the network. You can sort and search these lists.

Filter and search the list of all the dots on the network

Clicking on one of the companies or governments brings you to list of tenders they have participated in. Again, you can sort these however you like.

Sort them according to their value

In the tender details, you can see details of the tender (surprise!) and click through to other platforms such as TED (Tenders Electronic daily) and the opentender.eu dashboard.

We keep the details limited as well. Look for detailed documentation on the government websites!

But wait … there is more!?

The same company can show up with different spelling. This is because in the EU, there is no standard on how to fill in the name of the company. We have therefore implemented a simple tool to merge multiple dots on the network.

Before the computers get smart enough, you have to merge companies that are the same manually. Sorry!

There is also no standard to how much of the information is filled in. It so happens, that the government does not want to share how much a tender costs for example. After a long discussion on how do we show this sort of missing data in Elvis, we have settled on … fruit.

IT spending in The Netherlands costs strawberries

Why only Eastern Europe?

Despite we have data from the whole of the EU, Elvis only offers data from the countries of the former Eastern block. One of the reasons for this is the data quality: tenders in the eastern part of Europe are more transparent.

Value of the winning bid much more often present in tenders from the countries of the former Eastern block

Another important reason are available resources. The data (7 million tenders) is too big for our database at this moment. We are however looking for means to develop it further.

Do you like Elvis?

Do you have a question or idea? Would like to work with us? Feel free to contact us at tech [at] tenders.exposed !

How to publish budget and spending data openly

- August 31, 2018 in Open Knowledge, Open Spending

At the Global Initiative for Fiscal Transparency (GIFT) and Open Knowledge International (OKI) we believe that governments’ budget and spending data should be made available to all, so that anyone can see how their tax money is spent,what priorities their governments make, and governments can be held accountable. Increasingly governments make their budget data already openly available, and that is really great to see. Civil society organisations, but also individual researchers, journalists, and anyone who is interested, can use this data to generate insights and share those with the public. But still much of the information is only available in PDF and other non-open formats, and not published as data. As a result, scrutinising and putting the data to use is difficult and requires a lot of work. GIFT and OKI have partnered to address this issue. Along with the BOOST World Bank initiative and a dedicated open data community, we developed the [Open] Fiscal Data Package. Its version 1.0 is now available! We built the OpenSpending portal on top of the [Open] Fiscal Data Package, to make it really easy to publish budget and spending data. Once it is up, a whole suite of tools are readily available to anyone to view, visualise and integrate the data.

How to get your budget and spending data in OpenSpending

There are two ways to make your data available via OpenSpending. The first is to manually upload the data using the OpenSpending Packager. If you have your fiscal data available as a CSV file, you can try it today. The packager will guide you through an intuitive process, which in a few easy steps means that your data can be accessed and visualised by anyone via the OpenSpending platform. If you have any questions about it, reach out on our forum or find us on our chatroom on Gitter. If you want to publish your data more regularly and automatically, we can help you by setting up what we call a pipeline. This is fairly technical process that we have trialled with the Mexican government. Because this is an automated process, it makes it easier longer term for governments to adopt this process. If you are interested in this, we would love to hear from you via openspending-support@okfn.org. An example of what it could look like to have your data published, is on Mexican transparency portal as you can see below:
OpenSpending integrated in the Mexican Transparency Portal. Get in touch with us to learn more about this process.

OpenSpending integrated in the Mexican Transparency Portal. Get in touch with us to learn more about this process.

Want to learn more? Join our webinar!

If you are a local, regional or national government interested in learning how you can benefit from OpenSpending, please join our webinar on 12 September at 10am EST (3pm BST / 4pm CEST). OKI’s Fiscal Transparency lead Sander van der Waal will present the Fiscal Data Package specification version 1 and the OpenSpending toolset. This is a great opportunity for government representatives to learn how they can work with us to get their data into OpenSpending. The webinar can be accessed here. Bookmark your calendar now! 
Join our OpenSpending webinar on 12 September

Join our OpenSpending webinar on 12 September

Do you have any questions? Please reach out to us via email on openspending-support@okfn.org. We would love to hear from you!

Introducing Version 1 of the Fiscal Data Package specification

- May 28, 2018 in fiscal data package, Fiscal transparency, Open Fiscal Data, Open Spending

The Fiscal Data Package is a lightweight and user-oriented format for publishing and consuming fiscal data. Fiscal Data Packages are made of simple and universal components, are extremely flexible, can be produced from ordinary spreadsheet software and used in any environment. This specification started about five years ago with a first version (then known as the “Budget Data Package”). Since then we’ve made quite a few iterations, until a fairly stable version was reached, which we name ‘version 0.3’. This version was field-tested in various use cases and scenarios – most prominent among them is the Government of Mexico, who adopted Fiscal Data Package to be used for publishing their official budget data. For the past six months we’ve been hard at work in reshaping this specification to make it simpler to use and easier to adopt, while improving its flexibility and extensibility – thus making it relevant for more users. In many ways, this new version is the result of the collected experience and lessons learned in the past few years, working with partners and understanding what works and what doesn’t.

So what is the Fiscal Data Package philosophy?

The basic motivation behind Fiscal Data Package is to create a specification which is open by nature – based on other open standards, supported by open tools and software, modular, extensible and promoted transparently by a large community. The Fiscal Data Package is designed to be lightweight and simple to use – providing a small but flexible set of features, based on real-world requirements and not theoretical ones. All the while, the built-in extensibility allows this spec to adapt to many different use cases and domains. It is also possible to gradually use more and more part of this specification – progressive enhancement – thus making it easier to implement with existing data while slowly improving the data quality. A main concern we wanted to tackle was the ability to work with data as it currently exists, without forcing publishers to modify the contents or structure of their current data files in order to “adapt” them to the specification. This is a big deal, as publishers often publish data that’s the output of existing internal information systems, and requiring them to do any sort of data cleaning or wrangling on the data prior to uploading in a major source of friction for adoption.

And what is it not?

With that in mind, it’s also important to understand what this specification doesn’t handle. This specification is, by design, non-opinionated about which data should be published by publishers – which datasets, which fields and and the internal processes these reflect. The only things Fiscal Data Package is concerned with are how fiscal data should be packaged and providing means for publishers to best convey the meaning of the data – so it can be optimally used by consumers. In addition to that, it provides details regarding file-formats, data-types, metadata and structuring the data in files.

What we learned

As previously mentioned, via a wide range of technical implementations, partner piloting, and fiscal data projects with other civic tech and data journalist partners, we’ve learned a lot about what works in Fiscal Data Package v0.3, and what does not. We want to take these learnings and make a more robust and future proof v1.0 of the specification. One of the first thing we noticed wasn’t working was fiscal modelling. Version 0.3 of the specification contained an elaborate system for the modelling of fiscal data. In practice, this system turned out to be too complicated for normal users and error prone (inconsistent models could be created). To add to that, modelling was not versatile enough to account for the very different source files existing with real users, nor was it expressive enough to convey the specific semantics required by these users. A few examples of this strictness include:
  • The predefined set of classifications for dimensions. This hard-coded list did not capture the richness of fiscal data ‘in the wild’, as it contained too few and too broad options.
  • Measure columns were assumed to be of a specific currency, disregarding datasets in which the currency is provided in a separate column (or non monetary measures).
  • Measure columns were assumed to be of a specific budgeting phase (out of 4 options) and of a single direction (income/expenditure), ignoring data sets which have different phases, or that the phase or direction are provided in a separate column – or data sets which are not related to budgets altogether…
Another lesson learned is about file formats. Contrary to what its name might suggest, the world of fiscal data files is a wild jungle – every sort and form of file exists there (if you just look hard enough). Now, while machines will always prefer to read data files in their denormalised (or unpivoted) form – as it’s the most verbose and straightforward one – publishers will often choose a more compact, pivoted form – and as the proverb goes, there is more than one way to pivot a table. Other publishers would take out from the file some of the data, and append it as a separate code list file, or split large files based on year, budget direction or department. Version 0.3 of the specification assumed data files would only be provided in a very specific pivoted form – which might apply to some cases, but practically failed on many other variations that we’ve encountered.

Many different variations

What new features does Fiscal Data Package v1.0 provide?

First of all, it introduces a novel and simple way for supporting a wide variation of data file structures – pivoted and unpivoted, with code-lists and without them, provided in a single file or spanning across multiple files. To do that we’ve added 3 different devices:
  • We added the concept of ‘constant fields’: while still supporting any form of metadata added to the Fiscal Data Package descriptor, adding a field with some constant data is often a cleaner and more complete way for adding missing information to the dataset.
  • Added built-in facility for ‘unpivoting’ (or de-normalising) the source data: data is no longer expected to be provided in a very specific pivoted form – any structure of the data is now supported.
  • Use of Foreign Keys for allowing use of code-lists as part of the specification.
When we know the structure of the data, it allows us to bring all datasets to a single structure. This is crucial for comparisons – how can we compare two datasets when their structure is different? When the structure is known, it’s easier to ask questions about the data and easily refer to a single data point in the data (e.g. “what was the allocated budget for this contract in 2016?”).

Denormalisation

The second big feature of Version 1 is the introduction of ColumnTypes. ColumnTypes are a lightweight taxonomy for describing the columns of a fiscal data file – that is, not the concepts but their representations. For example, these types are not concerned with ‘Deficit’, ‘Supplier’ or ‘Economic classification’ – these are fiscal concepts. However, when put into a data file, columns such as ‘Supplier last name’ or ‘Title of 2nd level of func. class. in Dutch’ might be used. ColumnTypes are concerned with the data files themselves – and provide a way to extract the concept out of the columns. ColumnTypes can be combined into taxonomies of similarly-themed types. In these taxonomies, it’s possible to define some relationships between different types – for example, indicate a few ColumnTypes are parts of a more abstract concept. It’s also possible to assign data types and validation rules to a ColumnType, and more. Alongside this specification we’re also releasing two fiscal taxonomies which serve as standards for publishing budget files and spending files. These can be found here:

What’s next?

This announcement is of a release candidate – we’re looking forward for getting feedback and collaborating with the open-data and fiscal-standard communities. We’re planning to update existing tools (such as OpenSpending) and to build new tools to support this specification and provide integrations for other systems. Lastly – all this work wouldn’t have been available without the support and collaboration with our partners – chief among them are GIFT – Global Initiative for Fiscal Transparency, as well as the International Budget Partnership, Omidyar Network, google.org, The World Bank, the Government of Mexico and many other pilot governments. We thank them all for generous support in making this work possible. We really believe that Fiscal Data Package is an opportunity for governments and organisations that see the benefit in publishing budgets to foster transparency as part of a liberal democracy. You are invited to join us on this journey, which many government partners such as Croatia, Guatemala, Burkina Faso and Mexico have already started.
It is needed more than ever.

Open Budget Survey 2017: global comparison of budget transparency comes at a critical time

- February 6, 2018 in Featured, News, open budget survey, Open Spending

On 30 January 2018 the International Budget Partnership (IBP) has published the Open Budget Survey 2017 with an interactive Data Explorer developed for the IBP by Open Knowledge International and updated for the 2017 survey. Launched in 2006, the Open Budget Survey (OBS) is the world’s only independent, comparative assessment of the three pillars of public budget accountability: transparency, oversight and public participation. The sixth round of this biennial assessment in 2017 evaluated 115 countries across six continents. The coverage of the survey was expanded to include 13 countries for the first time, including some advanced economies such as Japan and Australia, emerging economies such as Côte d’Ivoire and Paraguay, and fragile states such as Somalia and South Sudan.

Open Budget Survey Data Explorer, map view

The results of the survey show that many governments around the world are making less information available about how they raise and spend public money. After 10 years of steady progress by countries, the 2017 survey shows a modest decline in average global budget transparency scores, from 45 in 2015 to 43 in 2017 for the 102 countries that were surveyed in both rounds (scores are out of a possible 100). This is in stark contrast to the average increase of roughly two points documented among comparable countries in each round of the survey between 2008 and 2015. The reversal of transparency gains is particularly discouraging given roughly three-quarters of the countries assessed do not publish sufficient budget information (a score of 61 or higher), seriously undermining the ability of citizens worldwide to hold their government accountable for using public funds efficiently and effectively. The Data Explorer, built by Open Knowledge International in 2006 and updated for this weeks release, allows users to visualise the data from current and previous surveys in a number of different ways. A map view shows the changing geography of openness over the six surveys, while a timeline  shows the movements of individual countries over the same period. A more detailed page of rankings shows graphically how each country’s score is calculated. A datasheet for each country presents the full data, letting the user see how it has performed on each test in every survey. Users can also generate custom reports, or download the entire dataset. Another useful feature allows users to see how a country’s score might change for the next survey. You can click to decide what changes to make to your chosen country’s budget systems, and the change that would result to its openness score is shown.

Open Budget Survey Data Explorer, timeline view

The Open Budget Survey 2017 could not come at a more critical juncture as we look to reinvigorate democratic practice, re-engage the disaffected, and restore public trust in public institutions. Around the world, there has been a decline in public trust in government, in part due to instances of corruption but also because of dramatic increases in inequality. In a number of countries, leaders who have disguised their intolerant and reactionary agendas with populist rhetoric have been swept into power by those who’ve been left behind. These political shifts have driven out many government champions of transparency and accountability — especially those from countries in the global south. More broadly across countries, there has been shrinking of civic space, rollbacks of media freedoms, and a crackdown on those who seek to hold government to account, including individual activists, civil society organizations, and journalists. Because open and accountable public budgeting is at the center of democratic practice and equity, it is the first place we should look for ways to strengthen the interaction between governments and citizens. Ensuring that the budgeting process is characterized by high levels of transparency, appropriate checks and balances, and opportunities for public participation is key to stemming the decline in confidence in government and representative democracy. In the face of the spread of profound threats to active, informed public participation, and thus the ability of citizens to ensure their governments will pursue policies that improve their lives, the Open Budget Survey 2017 provides essential data on the state of budget transparency and accountability around the world. The survey answers two fundamental questions to assess whether national governments produce and dissemination key budget documents to the public in a timely, comprehensive and accessible manner:
  1. Are the basic conditions needed for representative democracy to function — the free flow of information and opportunities for public participation in government decision making and oversight — being met in the budget sphere?
  2. Are empowered oversight institutions in place that can ensure adequate checks and balances?
In addition, the 2017 survey includes a newly enhanced evaluation of whether governments are providing formal opportunities for citizens and their organizations to participate in budget decisions and oversight, as well as emerging models for public engagement from a number of country innovators. It also examines the role and effectiveness of legislatures and supreme audit institutions in the budget process. Open Knowledge International helps governments to meet the goal of budget transparency by providing OpenSpending, a project to unlock public fiscal data. OpenSpending offers a platform and a fiscal data standard that makes publishing and visualising budget data easy and efficient. In collaboration with the Global Initiative for Fiscal Transparency (GIFT), OKI supports several federal ministries of finances around the globe in using the OpenSpending tools and the Fiscal Data Package, a light-weighted fiscal data schema built on the Frictionless Data Specifications that allows the data to be visualised and analysed. The full report of the Open Budget Survey is available from this page: the data visualisations can be viewed through the Data Explorer site.

Open Budget Survey 2017: global comparison of budget transparency comes at a critical time

- February 6, 2018 in Featured, News, open budget survey, Open Spending

On 30 January 2018 the International Budget Partnership (IBP) has published the Open Budget Survey 2017 with an interactive Data Explorer developed for the IBP by Open Knowledge International and updated for the 2017 survey. Launched in 2006, the Open Budget Survey (OBS) is the world’s only independent, comparative assessment of the three pillars of public budget accountability: transparency, oversight and public participation. The sixth round of this biennial assessment in 2017 evaluated 115 countries across six continents. The coverage of the survey was expanded to include 13 countries for the first time, including some advanced economies such as Japan and Australia, emerging economies such as Côte d’Ivoire and Paraguay, and fragile states such as Somalia and South Sudan.

Open Budget Survey Data Explorer, map view

The results of the survey show that many governments around the world are making less information available about how they raise and spend public money. After 10 years of steady progress by countries, the 2017 survey shows a modest decline in average global budget transparency scores, from 45 in 2015 to 43 in 2017 for the 102 countries that were surveyed in both rounds (scores are out of a possible 100). This is in stark contrast to the average increase of roughly two points documented among comparable countries in each round of the survey between 2008 and 2015. The reversal of transparency gains is particularly discouraging given roughly three-quarters of the countries assessed do not publish sufficient budget information (a score of 61 or higher), seriously undermining the ability of citizens worldwide to hold their government accountable for using public funds efficiently and effectively. The Data Explorer, built by Open Knowledge International in 2006 and updated for this weeks release, allows users to visualise the data from current and previous surveys in a number of different ways. A map view shows the changing geography of openness over the six surveys, while a timeline  shows the movements of individual countries over the same period. A more detailed page of rankings shows graphically how each country’s score is calculated. A datasheet for each country presents the full data, letting the user see how it has performed on each test in every survey. Users can also generate custom reports, or download the entire dataset. Another useful feature allows users to see how a country’s score might change for the next survey. You can click to decide what changes to make to your chosen country’s budget systems, and the change that would result to its openness score is shown.

Open Budget Survey Data Explorer, timeline view

The Open Budget Survey 2017 could not come at a more critical juncture as we look to reinvigorate democratic practice, re-engage the disaffected, and restore public trust in public institutions. Around the world, there has been a decline in public trust in government, in part due to instances of corruption but also because of dramatic increases in inequality. In a number of countries, leaders who have disguised their intolerant and reactionary agendas with populist rhetoric have been swept into power by those who’ve been left behind. These political shifts have driven out many government champions of transparency and accountability — especially those from countries in the global south. More broadly across countries, there has been shrinking of civic space, rollbacks of media freedoms, and a crackdown on those who seek to hold government to account, including individual activists, civil society organizations, and journalists. Because open and accountable public budgeting is at the center of democratic practice and equity, it is the first place we should look for ways to strengthen the interaction between governments and citizens. Ensuring that the budgeting process is characterized by high levels of transparency, appropriate checks and balances, and opportunities for public participation is key to stemming the decline in confidence in government and representative democracy. In the face of the spread of profound threats to active, informed public participation, and thus the ability of citizens to ensure their governments will pursue policies that improve their lives, the Open Budget Survey 2017 provides essential data on the state of budget transparency and accountability around the world. The survey answers two fundamental questions to assess whether national governments produce and dissemination key budget documents to the public in a timely, comprehensive and accessible manner:
  1. Are the basic conditions needed for representative democracy to function — the free flow of information and opportunities for public participation in government decision making and oversight — being met in the budget sphere?
  2. Are empowered oversight institutions in place that can ensure adequate checks and balances?
In addition, the 2017 survey includes a newly enhanced evaluation of whether governments are providing formal opportunities for citizens and their organizations to participate in budget decisions and oversight, as well as emerging models for public engagement from a number of country innovators. It also examines the role and effectiveness of legislatures and supreme audit institutions in the budget process. Open Knowledge International helps governments to meet the goal of budget transparency by providing OpenSpending, a project to unlock public fiscal data. OpenSpending offers a platform and a fiscal data standard that makes publishing and visualising budget data easy and efficient. In collaboration with the Global Initiative for Fiscal Transparency (GIFT), OKI supports several federal ministries of finances around the globe in using the OpenSpending tools and the Fiscal Data Package, a light-weighted fiscal data schema built on the Frictionless Data Specifications that allows the data to be visualised and analysed. The full report of the Open Budget Survey is available from this page: the data visualisations can be viewed through the Data Explorer site.

Openbudgets.eu: the new platform for financial transparency in Europe

- September 7, 2017 in open budget, open budget data, open budgets, Open Fiscal Data, Open Spending

Today, OpenBudgets officially launches its fiscal transparency platform. Using
OpenBudgets.eu journalists, civil servants, and data scientists can process, analyse, and explore the nature and relevance of fiscal data.
The platform offers a toolbox to everyone who wants to upload, visualise and analyse fiscal data. From easy to use visualisations and high level analytics to fun games and accessible explanations of public budgeting and corruption practices along with
participatory budgeting tools, it caters to the needs of journalists, activists, policy makers and civil servants alike. The first successful implementations and projects have been developed in Thessaloniki, Paris, and Bonn, where civil society organisations and civil servants have together built budget visualisation for the general public.The cooperation between IT and administration resulted in 3 local instances of OpenBudgets.eu, setting the example for future implementations around Europe. On the EU level, the project has campaigned for transparency in MEP expenses and better quality data on the European subsidies. The OpenBudgets.eu project
subsidystories has uncovered how almost 300 billion in EU subsidies is spent. The MEP expenses campaign has led to the President of the European Parliament committing to introduce concrete proposals for reform of the MEPs’ allowance scheme by the end of the year. Finally, the project has created tailor-made tools for journalists as our research has shown that there was a lack of contextual knowledge and knowledge on the basics of accounting. ‘Cooking budgets’presents the basics of accounting in a satirical website, and the successful game ‘The good, the bad and the accountant’ simulates the struggle of a civil servant to retain its integrity. The three approaches and audiences to public budgeting have resulted in a holistic platform which tailors to the wider public who wants to have more insights in their local, regional, national and even EU budgets. With the launch of OpenBudgets.eu the field of financial transparency in Europe is enriched by new tools, data, games and research for journalists, civil society organisations and civil servants alike, resulting in a valuable resource for a broad target audience. OpenBudgets.eu has received funding from the European Union’s H2020 EU research and innovation programme under grant agreement No 645833 and is implemented by an international consortium of nine partners (including Open Knowledge International and Open Knowledge Foundation Germany) under the coordination of Fraunhofer IAIS.

OpenSpending platform update

- August 16, 2017 in Open Knowledge, Open Spending

Introduction

OpenSpending is a free, open and global platform to search, visualise, and analyse fiscal data in the public sphere. This week, we soft launched an updated technical platform, with a newly designed landing page. Until now dubbed “OpenSpending Next”, this is a completely new iteration on the previous version of OpenSpending, which has been in use since 2011. At the core of the updated platform is Fiscal Data Package. This is an open specification for describing and modelling fiscal data, and has been developed in collaboration with GIFT. Fiscal Data Package affords a flexible approach to standardising fiscal data, minimising constraints on publishers and source data via a modelling concept, and enabling progressive enhancement of data description over time. We’ll discuss in more detail below. From today:
  • Publishers can get started publishing fiscal data with the interactive Packager, and explore the possibilities of the platform’s rich API, advanced visualisations, and options for integration.
  • Hackers can work on a modern stack designed to liberate fiscal data for good! Start with the docs, chat with us, or just start hacking.
  • Civil society can access a powerful suite of visualisation and analysis tools, running on top of a huge database of open fiscal data. Discover facts, generate insights, and develop stories. Talk with us to get started.
All the work that went into this new version of OpenSpending was only made possible by our funders along the way. We want to thank Hewlett, Adessium, GIFT, and the OpenBudgets.eu consortium for helping fund this work. As this is now completely public, replacing the old OpenSpending platform, we do expect some bugs and issues. If you see anything, please help us by opening a ticket on our issue tracker.

Features

The updated platform has been designed primarily around the concept of centralised data, decentralised views: we aim to create a large, and comprehensive, database of fiscal data, and provide various ways to access that data for others to build localised, context-specific applications on top. The major features of relevance to this approach are described below.

Fiscal Data Package

As mentioned above, Fiscal Data Package affords a flexible approach to standardising fiscal data. Fiscal Data Package is not a prescriptive standard, and imposes no strict requirements on source data files. Instead, users “map” source data columns to “fiscal concepts”, such as amount, date, functional classification, and so on, so that systems that implement Fiscal Data Package can process a wide variety of sources without requiring change to the source data formats directly. A minimal Fiscal Data Package only requires mapping an amount and a date concept. There are a range of additional concepts that make fiscal data usable and useful, and we encourage the mapping of these, but do not require them for a valid package. Based on this general approach to specifying fiscal data with Fiscal Data Package, the updated OpenSpending likewise imposes no strict requirements on naming of columns, or the presence of columns, in the source data. Instead, users (of the graphical user interface, and also of the application programming interfaces) can provide any source data, and iteratively create a model on top of that data that declares the fiscal measures and dimensions.

GUIs

Packager

The Packager is the user-facing app that is used to model source data into Fiscal Data Packages. Using the Packager, users first get structural and schematic validation of the source files, ensuring that data to enter the platform is validly formed, and then they can model the fiscal concepts in the file, in order to publish the data. After initial modelling of data, users can also remodel their data sources for a progressive enhancement approach to improving data added to the platform.

Explorer

The Explorer is the user-facing app for exploration and discovery of data available on the platform.

Viewer

The Viewer is the user-facing app for building visualisations around a dataset, with a range of options, for presentation, and embedding views into 3rd party websites.

DataMine

The DataMine is a custom query interface powered by Re:dash for deep investigative work over the database. We’ve included the DataMine as part of the suite of applications as it has proved incredibly useful when working in conjunction with data journalists and domain experts, and also for doing quick prototype views on the data, without the limits of API access, as one can use SQL directly.

APIs

Datastore

The Datastore is a flat file datastore with source data stored in Fiscal Data Packages, providing direct access to the raw data. All other databases are built from this raw data storage, providing us with a clear mechanism for progressively enhancing the database as a whole, as well as building on this to provide such features directly to users.

Analytics and Search

The Analytics API provides a rich query interface for datasets, and the search API provides exploration and discovery capabilities across the entire database. At present, search only goes over metadata, but we have plans to iterate towards full search over all fiscal data lines.

Data Importers

Data Importers are based on a generic data pipelining framework developed at Open Knowledge International called Data Package Pipelines. Data Importers enable us to do automated ETL to get new data into OpenSpending, including the ability to update data from the source at specified intervals. We see Data Importers as key functionality of the updated platform, allowing OpenSpending to grow well beyond the one thousand plus datasets that have been uploaded manually over the last five or so years, towards tens of thousands of datasets. A great example of how we’ve put Data Importers to use is in the EU Structural Funds data that is part of the Subsidy Stories project.

Iterations

It is slightly misleading to announce the launch today, when we’ve in fact been using and iterating on OpenSpending Next for almost 2 years. Some highlights from that process that have led to the platform we have today are as follows.

SubsidyStories.eu with Adessium

Adessium provided Open Knowledge International with funding towards fiscal transparency in Europe, which enabled us to build out significant parts of the technical platform, commision work with J++ on Agricultural Subsidies , and, engage in a productive collaboration with Open Knowledge Germany on what became SubsidyStories.eu, which even led to another initiative from Open Knowledge Germany called The Story Hunt. This work directly contributed to the technical platform by providing an excellent use case for the processing of a large, messy amount of source data into a normalised database for analysis, and doing so while maintaining data provenance and the reproducibility of the process. There is much to do in streamlining this workflow, but the benefits, in terms of new use cases for the data, are extensive. We are particularly excited by this work, and the potential to continue in this direction, by building out a deep, open database as a potential tool for investigation and telling stories with data.

OpenBudgets.eu via Horizon 2020

As part of the OpenBudgets.eu consortium, we were able to both build out parts of the technical platform, and have a live use case for the modularity of the general architecture we followed. A number of components from the core OpenSpending platform have been deployed into the OpenBudgets.eu platform with little to no modification, and the analytical API from OpenSpending was directly ported to run on top of a triple store implementation of the OpenBudgets.eu data model. An excellent outcome of this project has been the close and fruitful work with both Open Knowledge Germany and Open Knowledge Greece on technical, community, and journalistic opportunities around OpenSpending, and we plan for continuing such collaborations in the future.

Work on Fiscal Data Package with GIFT

Over three phases of work since 2015 (the third phase is currently running), we’ve been developing Fiscal Data Package as a specification to publish fiscal data against. Over this time, we’ve done extensive testing of the specification against a wide variety of data in the wild, and we are iterating towards a v1 release of the specification later this year. We’ve also been piloting the specification, and OpenSpending, with national governments. This has enabled extensive testing of both the manual modeling of data to the specification using the OpenSpending Packager, and automated ETL of data into the platform using the Data Package Pipelines framework. This work has provided the opportunity for direct use by governments of a platform we initially designed with civil society and civic tech actors in mind. We’ve identified difficulties and opportunities in this arena at both the implementation and the specification level, and we look forward to continuing this work and solving use cases for users inside government.

Credits

Many people have been involved in building the updated technical platform. Work started back in 2014 with an initial architectural vision articulated by our peers Tryggvi Björgvinsson and Rufus Pollock. The initial vision was adapted and iterated on by Adam Kariv (Technical Lead) and Sam Smith (UI/X), with Levko Kravets, Vitor Baptista, and Paul Walsh. We reused and enhanced code from Friedrich Lindenberg. Lazaros Ioannidis and Steve Bennett made important contributions to the code and the specification respectively. Diana Krebs, Cecile Le Guen, Vitoria Vlad and Anna Alberts have all contributed with project management, and feature and design input.

What’s next?

There is always more work to do. In terms of technical work, we have a long list of enhancements.
However, while the work we’ve done in the last years has been very collaborative with our specific partners, and always towards identified use cases and user stories in the partnerships we’ve been engaged in, it has not, in general, been community facing. In fact, a noted lack of community engagement goes back to before we started on the new platform we are launching today. This has to change, and it will be an important focus moving forward. Please drop by at our forum for any feedback, questions, and comments.

Valtion hankintatiedot avoimena datana – hieno edistysaskel tulossa?!

- August 9, 2017 in avoin data, avoin hallinto, Featured, godi, godi 2016, hansel, julkiset hankinnat, Open Government Data, Open Government Partnership, Open Spending

Tiedon avoimuutta on tarpeen lisätä, Helsingin Sanomien pääkirjoituksessa 7.8.2017 todetaan. Olemme samaa mieltä! Viime vuosina useat Suomen kunnat ovat julkaisseet tietoja omista hankinnoistaan, jopa kuittitasolla. Tämä käytäntö on laajenemassa uuden ns. Hansel-lain myötä, jonka myötä eri ministeriöiden, laitosten, virastojen ja mahdollisesti maakuntien hankinnat julkaistaisiin keskitetysti valtion hankintayhtiön Hansel Oy:n toimesta. Hallitus valmistelee uutta ns. Hansel-lakia (Hallituksen esitys HE 63/2017 vp Hallituksen esitys eduskunnalle laiksi Hansel Oy -nimisestä osakeyhtiöstä annetun lain muuttamisesta). Laki on käsittelyssä talousvaliokunnassa, jossa sen yksityiskohtia viimeistellään. Avoimuuden ja avoimen datan kannalta erityisen kiinnostavia ovat lakiehdotuksen kohdat, jossa ehdotetaan säädettäväksi uusi säännös hankintatiedon käsittelyyn liittyvästä tietojensaanti- ja käsittelyoikeudesta (2§ ja 5§). Nähdäksemme toteutuessaan Hansel-laki lisää hallinnon avoimuutta erinomaisella tavalla. Samalla kun Suomi täyttää kansainvälisiä sitoumuksiaan, saamme verovarat tehokkaammin käyttöön, kilpailu julkisista hankinnoista on reilumpaa ja julkinen rahankäyttö on ylipäätään avoimempaa.

Hansel-laki ja hankintoja koskeva avoin data

Hansel Oy on siis toiminut valtion yhteishankintayksikkönä ja kilpailuttanut asiakkailleen sellaisia tavara- ja palveluhankintoja, joita valtionhallinnossa käytetään laajasti. Hanselin tehtäviin on kuulunut myös asiakkaiden omien hankintojen kilpailuttaminen sekä erilaiset hankintatoimeen liittyvät asiantuntijatehtävät. Viime vuosina yhtiön tehtävät ovat kehittyneet muun muassa valtion hankintatoimen digitalisointiohjelman myötä, minkä vuoksi lakiin ehdotetaan tehtäväksi joitakin yhtiön tehtäviin liittyviä täsmennyksiä.  Laissa yhtiön tehtäviä siis ajantasaistetaan. Talousvaliokunnassa lakitekstiä on tiettävästi muotoiltu eteenpäin, mutta viimeisin julkinen versio (HE 63/2017) kuvaa Hanselin muuttuneita tehtäviä mm. seuraavasti (boldaus kirjoittajan): 2§ Yhtiön tehtävät 2 mom: Yhtiön tehtävänä on tuottaa asiakkailleen yhteishankintatoimintoja ja hankintojen tukitoimintoja. Yhtiö ylläpitää hankintasopimuksia ja tuottaa asiakkailleen hankintasopimuksiin liittyvää asiantuntijapalvelua. Lisäksi yhtiön tehtävänä on tuottaa asiakkailleen hankintatoimeen liittyviä asiantuntija- ja kehittämispalveluja sekä hankintatiedon käsittely- ja analysointipalveluja ja näihin liittyviä teknisiä ratkaisuja. 5 §  Tiedonsaantioikeus ja tietojen tuottaminen 4 mom: Yhtiö voi tuottaa, luovuttaa ja julkaista hankintatietoa käsittävää tietoaineistoa, jos tietoaineiston luovuttaminen ei sen muodostamisessa käytettyjen hakuperusteiden, tietojen määrän, laadun tai sisällön taikka tietoaineiston käyttötarkoituksen vuoksi ole vastoin sitä, mitä tietojen salassapidosta ja henkilötietojen suojasta säädetään. Alla muutama Open Knowledge Finland ry:n näkemys lakiin liittyen.

Hansel-laki lisää julkisten hankintojen tervettä kilpailua ja tehokkuutta

Hankintatietojen avoimuus edistää reilua kilpailua eri toimittajien kesken. Hankintojen vertailutietojen kautta saadaan kustannustehokkuutta hankintoihin ja sitä kautta verovarojen käyttöön. Kun hankinnat kuvataan vertailukelpoisesti, voidaan helposti seurata esimerkiksi, maksaako joku yksikkö huomattavan erilaista hintaa toiseen verrattuna tai onko hankinnoissa jotain muuta poikkeavaa tai erikoista ja kenties parannettavaa (kuten hankintojen kasautuminen vuoden loppuun). Pidemmällä tähtäimellä vertailukelpoiseen dataan voisi lisätä tai yhdistää vaikkapa alkuperätietoa, sertifiointeja, eettistä tietoa tai muuta vertailutietoa. Yksityiskohtainen hankintatieto voi auttaa ehkäisemään korruptiota ja harmaata taloutta.

Julkisuuslaki, läpinäkyvyys ja oikeus tietoon koskee myös hankintatietoja

Joka tapauksessa julkisuuslain mukaan kansalaisilla, järjestöillä ja medialla on jo nyt olemassa oikeus tietoon – myös hankintatietoon – silloin kun kyse ei ole erityisistä seikoista kuten esimerkiksi turvallisuusasioista tai tietyn tyyppisistä yrityssalaisuuksista. Riippumatta Hansel-laista, oikeus tähän tietoon on olemassa, eikä siltä osin ole tiedossa muutoksia.  Mutta kiinnostava muutos on, että Hansel-lain myötä saadaan selkeyttä ja yhdenmukaisuutta tiedon julkaisuun liittyviin käytänteihin ja laki toteuttaa ja tarkentaa siten julkisuuslain henkeä.

Yksi toimija hankintatiedon julkaisijana on tehokas tapa lisätä läpinäkyvyyttä ilman suurta hallinnollista taakkaa julkishallinnolle

Yksi datan avaamisen haasteista julkishallinnossa on ollut julkisuuslain erilaisten tulkintojen määrä – tämä tuli esille mm. Valtioneuvoston kanslian selvitys- ja tutkimustoiminnan “Avoimen datan hyödyntäminen ja vaikuttavuus”  -raportissat, jonka ETLA ja Open Knowledge Finland tekivät. Vastaavasti kaupungit avatessaan ostolaskujaan, ovat soveltaneet toisistaan poikkeavia käytäntöjä ja dataformaatteja. 6Aika-hanke ja kuntaliitto ovatkin pyrkineet ohjeistamalla yhtenäistämään käytäntöjä. Kaavailtu käytäntö yhtenäistää käytäntöjä eri ministeriöiden, virastojen ja tutkimuslaitosten kesken. Kaavailtu käytäntö keventää hallinnollista taakkaa kun asiat, kuten tiedon siivoaminen, formatointi, priorisointi, ongelmienratkaisu, dokumentointi, julkaisukäytännöt ym. tukitoiminnot hoidetaan yhdessä paikassa, eli Hansel toimii tässä siis eräänlaisena ns. clearinghousena, laadunvarmistajana ja tiedon hyödyntäjien rajapintana. Yhtenäiset käytännöt puolestaan paitsi lisäävät julkishallinnon tehokkuutta datan julkaisussa, myös helpottavat datan löydettävyyttä ja hyödynnettävyyttä.  Sivumennen sanoen, uuden lakiehdotuksen kaavailemat tietopalvelut täydentävät muita meneillään olevia hankkeita, kuten YTI-hanke ja Kuntatieto-ohjelma. Hansel on valtionvarainministeriön ohjaukessa ja hankintatietojen avoimuus yhtenä asiana Valtion hankintojen digitalisaatio -toteutusohjelmaa, joten tahtotilaa modernisointiin tuntuu olevan laajemmin.  

Kansainväliset johtajuus ja tehtyjen sitoumuksien lunastaminen

Yleisesti Suomi sijoittuu kansainvälisesti avoimeen dataan ja avoimeen tietoon liittyvissä vertailuissa melko hyvin. Esimerkiksi uusimmassa Open Knowledge Internationalin Government Open Data Index 2016 -vertailussa olemme sijalla 5. Toisaalta, nimenomaan taloustietojen avoimuudessa olemme varsin surkeita – hankintojen (“procurement” – hankintailmoitusten ja sopimusten) tiimoilta “45% avoin” ja ostojen (“Government spending” – todellinen kulutus) jopa hälyttävällä 0% tasolla! Hansel-lain myötä pysymme mukana kansainvälisessä kehityksessä kun vahvistamme todettuja heikkouksiamme.   Suomi on myös mukana USA:n ex-Presidentti Obaman aloittamassa avoimen hallinnon kumppanuusohjelmassa (Open Government Partnership), jossa eri maiden (yli 70 maata on mukana) hallinnot yhdessä kansalaisyhteiskunnan kanssa tekevät sitovia avoimuutta edistäviä konkreettisia toimenpiteitä ja sitoumuksia.  Hankintatietojen avoimuus on myös Suomen Avoimen hallinnon 3. Toimintaohjelmassa (2017-2019) yhtenä konkreettisena lupauksena. Toimintaohjelmassa sanotaan näin.
  1. sitoumus
Julkaistaan valtion hankintatiedot kansalaisille. Julkaistaan avoimesti verkossa tiedot siitä, mitä valtio ostaa, millä rahalla ja mistä. Valtion hankintatiedot julkaistaan keväällä 2017 avoimena datana. Samalla toteutetaan kaikille avoin palvelu, jossa kansalaiset ja yritykset voivat seurata lähes reaaliaikaisesti valtion hankintoihin liittyvän rahan käyttöä. Palvelujen tietosisältönä ovat hankintojen julkiset tiedot, joista käy ilmi, mitä valtion organisaatiot hankkivat ja mistä hankinnat tehdään. Sinänsä Hansel-laki ja sen kuvailemat hankintatietoon liittyvät tietopalvelut eivät ole välttämättä suunniteltuja “vain” kansalaisille, vaan palveluille on luonnollisesti useita eri käyttäjiä, kuten yritykset, media ja julkinen sektori itse. Ylipäätään olennaista on, että data avataan. Tällöin erilaiset toimijat voivat tehdä omista näkökulmistaan erilaisia kiinnostavia sovelluksia – joku tekee vertailuja tai visualisointeja, joku myynnin ja markkinoinnin työkaluja ja niin edelleen! Näin eri toimijat täydentävät Hanselin osaamista ja tarjontaa – tieto kun ei jakamalla kulu. Eräänlainen verrokki voisi olla valtion budjetti ja sen ympärillä olevat sovellukset: valtion budjettia kuvaava, Hahmota Oy:n tekemä www.valtionbudjetti.fi joka tavallaan täydentää VM:n omaa www.tutkibudjettia.fi -palvelua. Vastaavasti Hanselin mahdollisesti tuottaman verkkotyökalun (jolla voi tutkia ja analysoida hankintoja tietyin kriteerein) lisäksi on hyvin mahdollista, että syntyy muita palveluita tai analyysityökaluja hankintoihin. Toteuttaakseen kaavailtua lakia sekä em. avoimen hallinnon sitoumusta, Hansel onkin käsittääkseni hahmotellut tulevaa verkkopalvelua, jossa hankintoja voisi analysoida. Alla muutamia esimerkinomaisia ruutukaappauksia sovelluksesta, jotka antavat suuntaviivoja siitä miltä verkkopalvelu voisi näyttää. Nämä ruutukaappaukset ovat tietystikin suuntaa-antavia, mutta vaikuttavat lupaavalta. Tarkempia analyysejä varten itse kukin voisi sitten ladata tietoa sopivin kriteerein rajattuna. Ylipäätään talouden ja talouselämän avoimuutta ja avointa tietoa olisi järkevää lisätä jatkossa. Tavoitteena tulisi mielestämme olla, että “kolminaisuus”, eli  julkiset budjetit, sopimukset ja hankinnat olisivat saatavilla avoimesti standardimuotoisina. Hankintatiedot on erinomainen askel.   Odotamme Open Knowledge:ssa innolla uutta Hansel-lakia ja ylipäätään julkisten hankintojen lisääntyvää avoimuutta. Avoimuus on omiaan paitsi hälventämään mahdollista epäluottamusta, myös lisäämään tehokkuutta ja reilua kilpailua. Kyseessä on siis veronmaksajien etu ja oikeudenmukaisuus. The post Valtion hankintatiedot avoimena datana – hieno edistysaskel tulossa?! appeared first on Open Knowledge Finland.

csv,conf,v3

- May 30, 2017 in Events, Frictionless Data, OD4D, Open Spending

The third manifestation of everyone’s favorite community conference about data—csv,conf,v3—happened earlier this May in Portland, Oregon. The conference brought together data makers/doers/hackers from various backgrounds to share knowledge and stories about data in a relaxed, convivial, alpaca-friendly (see below) environment. Several Open Knowledge International staff working across our Frictionless Data, OpenSpending, and Open Data for Development projects made the journey to Portland to help organize, give talks, and exchange stories about our lives with data. Thanks to Portland and the Eliot Center for hosting us. And, of course, thanks to the excellent keynote speakers Laurie Allen, Heather Joseph, Mike Bostock, and Angela Bassa who provided a great framing for the conference through their insightful talks. Here’s what we saw.

Talks We Gave

The first priority for the team was to present on the current state of our work and Open Knowledge International’s mission more generally. In his talk, Continuous Data Validation for Everybody, developer Adrià Mercader updated the crowd on the launch and motivation of goodtables.io: It was a privilege to be able to present our work at one of my favourite conferences. One of the main things attendees highlight about csv,conf is how diverse it is: many different backgrounds were represented, from librarians to developers, from government workers to activists. Across many talks and discussions, the need to make published data more useful to people came up repeatedly. Specifically, how could we as a community help people publish better quality data? Our talk introducing goodtables.io presented what we think will be a dominant approach to approaching this question: automated validation. Building on successful practices in software development like automated testing, goodtables.io integrates within the data publication process to allow publishers to identify issues early and ensure data quality is maintained over time. The talk was very well received, and many people reached out to learn more about the platform. Hopefully, we can continue the conversation to ensure that automated (frictionless) data validation becomes the standard on all data publication workflows. David Selassie Opoku presented When Data Collection Meets Non-technical CSOs in Low-Income Areas: csv,conf was a great opportunity to share highlights of the OD4D (and School of Data) team’s data collection work. The diverse audience seemed to really appreciate insights on working with non-technical CSOs in low-income areas to carry out data collection. In addition to highlighting the lessons from the work and its potential benefit to other regions of the world, I got to connect with data literacy organisations such as Data Carpentry who are currently expanding their work in Africa and could help foster potential data literacy training partnerships. As a team working with CSOs in low-income areas like Africa, School of Data stands to benefit from continuing conversations with data “makers” in order to present potential use cases. A clear example I cited in my talk was Kobo Toolbox, which continues to mitigate several daunting challenges of data collection through abstraction and simple user interface design. Staying in touch with the csv,conf community may highlight more such scenarios which could lead to the development of new tools for data collection. Paul Walsh, in his talk titled Open Data and the Question of Quality (slides) talked about lessons learned from working on a range of government data publishing projects and we can do as citizens to demand better quality data from our governments:

Talks We Saw

Of course, we weren’t there only to present; we were there to learn from others as well. Before the conference, through our Frictionless Data project, we have been lucky to be in contact with various developers and thinkers around the world who also presented talks at the conference. Eric Busboom presented Metatab, an approach to packaging metadata in spreadsheets. Jasper Heefer of Gapminder talked about DDF, a data description format and associated data pipeline tool to help us live a more fact-based existence. Bob Gradeck of the Western Pennsylvania Regional Data Center talked about data intermediaries in civic tech, a topic near and dear to our hearts here at Open Knowledge International.

Favorite Talks

Paul’s:
  • “Data in the Humanities Classroom” by Miriam Posner
  • “Our Cities, Our Data” by Kate Rabinowitz
  • “When Data Collection Meets Non-technical CSOs in Low Income Areas” by David Selassie Opoku
David’s:
  • “Empowering People By Democratizing Data Skills” by Erin Becker
  • “Teaching Quantitative and Computational Skills to Undergraduates using Jupyter Notebooks” by Brian Avery
  • “Applying Software Engineering Practices to Data Analysis” by Emil Bay
  • “Open Data Networks with Fieldkit” by Eric Buth
Jo’s:
  • “Smelly London: visualising historical smells through text-mining, geo-referencing and mapping” by Deborah Leem
  • “Open Data Networks with Fieldkit” by Eric Buth
  • “The Art and Science of Generative Nonsense” Mouse Reeve
  • “Data Lovers in in a Dangerous Time” by Bendan O’Brien

Data Tables

This csv,conf was the first csv,conf to have a dedicated space for working with data hands-on. In past events, attendees left with their heads buzzing full of new ideas, tools, and domains to explore but had to wait until returning home to try them out. This time we thought: why wait? During the talks, we had a series of hands-on workshops where facilitators could walk through a given product and chat about the motivations, challenges, and other interesting details you might not normally get to in a talk. We also prepared several data “themes” before the conference meant to bring people together on a specific topic around data. In the end, these themes proved a useful starting point for several of the facilitators and provided a basis for a discussion on cultural heritage data following on from a previous workshop on the topic. The facilitated sessions went well. Our own Adam Kariv walked through Data Package Pipelines, his ETL tool for data based on the Data Package framework. Jason Crawford demonstrated Fieldbook, a tool for managing easily managing a database in-browser as you would a spreadsheet. Bruno Vieira presented Bionode, going into fascinating detail on the mechanics of Node.js Streams. Nokome Bentley walked through a hands-on introduction to accessible, reproducible data analysis using Stencila, a way to create interactive, data-driven documents using the language of your choice to enable reproducible research. Representatives from data.world, an Austin startup we worked with on an integration for Frictionless Data also demonstrated uploading datasets to data.world. The final workshop was conducted by several members of the Dat team, including co-organizer Max Ogden, with a super enthusiastic crowd. Competition from the day’s talks was always going to be fierce, but it seems that many attendees found some value in the more intimate setting provided by Data Tables.

Thanks

If you were there at csv,conf in Portland, we hope you had a great time. Of course, our thanks go to the Gordon and Betty Moore Foundation and to Sloan Foundation for enabling me and my fellow organizers John Chodacki, Max Ogden, Martin Fenner, Karthik, Elaine Wong, Danielle Robinson, Simon Vansintjan, Nate Goldman and Jo Barratt who all put so much personal time and effort to bringing this all together. Oh, and did I mention the Comma Llama Alpaca? You, um, had to be there.

Making European Subsidy Data Open

- April 24, 2017 in OK Germany, Open Government Data, Open Spending

One month after releasing subsidystories.eu a joint project of Open Knowledge Germany and Open Knowledge International, we have some great news to share. Due to the extensive outreach of our platform and the data quality report we published, new datasets have been directly sent to us by several administrations. We have recently added new data for Austria, the Netherlands, France and the United Kingdom. Furthermore, first Romanian data recently arrived and should be available in the near future. Now that the platform is up and running, we want to explain how we actually worked on collecting and opening all the beneficiary data. Subsidystories.eu is a tool that enables the user to visualize, analyze and compare subsidy data across the European Union thereby enhancing transparency and accountability in Europe. To make this happen we first had to collect the datasets from each EU member state and scrape, clean, map and then upload the data. Collecting the data was an incredible frustrating process, since EU member states publish the beneficiary data in their own country (and regional) specific portals which had to be located and often translated. A scraper’s nightmare: different websites and formats for every country The variety in how data is published throughout the European Union is mind-boggling. Few countries publish information on all three concerned ESIF Funds (ERDF, ESF, CF) in one online portal, while most have separate websites distinguished by funds. Germany provides the most severe case of scatteredness, not only is the data published by its regions (Germany’s 16 federal states), but different websites for distinct funds exist (ERDF vs. ESF) leading to a total of 27 German websites. Arguably making the German data collection just as tedious as collecting all data for the entire rest of the EU. Once the distinct websites were located through online searches, they often needed to be translated to English to retrieve the data. As mentioned the data was rarely available in open formats (counting csv, json or xls(x) as open formats) and we had to deal with a large amount of PDFs (51) and webapps (15) out of a total of 122 files. The majority of PDF files was extracted using Tabula, which worked fine some times and required substantial work with OpenRefine – cleaning misaligned data – for other files. About a quarter of the PDFs could not be scraped using tools, but required hand tailored scripts by our developer. Data Formats
However, PDFs were not our worst nightmare: that was reserved for webapps such as this French app illustrating their 2007-2013 ESIF projects. While the idea of depicting the beneficiary data on a map may seem smart, it often makes the data useless. These apps do not allow for any cross project analysis and make it very difficult to retrieve the underlying information. For this particular case, our developer had to decompile the flash to locate the multiple dataset and scrape the data. Open data: political reluctance or technical ignorance? These websites often made us wonder what the public servants that planned this were thinking? They already put in substantial effort (and money) when creating such maps, why didn’t they include a “download data” button? Was it an intentional decision to publish the data, but make difficult to access? Or is the difference between closed and open data formats simply not understood well enough by public servants? Similarly, PDFs always have to be created from an original file, while simply uploading that original CSV or XLSX file could save everyone time and money. In our data quality report we recognise that the EU has made progress on this behalf in their 2013 regulation mandating that beneficiary data be published in an open format. While publication in open data formats has increased henceforth, PDFs and webapps remain a tiring obstacle. The EU should assure the member states’ compliance, because open spending data and a thorough analysis thereof, can lead to substantial efficiency gains in distributing taxpayer money. This blog has been reposted from https://okfn.de/blog/2017/04/Making-EU-Data-Open/