You are browsing the archive for Open Data.

Data is a Team Sport: Government Priorities and Incentives

Dirk Slater - August 13, 2017 in Ania Calderon, Data Blog, data literacy, Event report, Fabriders, Government, Open Data, research, Tamara Puhovski, Team Sport, The Open Data Charter

Data is a Team Sport is our open-research project exploring the data literacy eco-system and how it is evolving in the wake of post-fact, fake news and data-driven confusion.  We are producing a series of videos, blog posts and podcasts based on a series of online conversations we are having with data literacy practitioners. To subscribe to the podcast series, cut and paste the following link into your podcast manager : http://feeds.soundcloud.com/users/soundcloud:users:311573348/sounds.rss or find us in the iTunes Store and Stitcher. The conversation in this episode focuses on the challenges of getting governments to prioritise data literacy both externally and internally, and incentives to produce open-data and features:
  • Ania Calderon, Executive Director at the Open Data Charter, a collaboration between governments and organisations working to open up data based on a shared set of principles. For the past three years, she led the National Open Data Policy in Mexico, delivering a key presidential mandate. She established capacity building programs across more than 200 public institutions.
  • Tamara Puhovskia sociologist, innovator, public policy junky and an open government consultant. She describes herself as a time traveler journeying back to 19th and 20th century public policy centers and trying to bring them back to the future.

Notes from the conversation:

Access to government produced open-data is critical for healthy functioning democracies. It takes an eco-system that includes a critical thinking citizenry, knowledgeable civil servants, incentivised elected officials, and smart open-data advocates.  Everyone in the eco-system needs to be focused on long-term goals.
  • Elected officials needs incentivising beyond monetary arguments, as budgetary gains can take a long time to fruition.
  • Government’s capacities to produce open-data is an issue that needs greater attention.
  • We need to get past just making arguments for open-data, but be able to provide good solid stories and examples of its benefits.

Resources mentioned in the conversation:

Also, not mentioned, but be sure to check out Tamara’s work on Open Youth

View the full online conversation:

Flattr this!

An approach to building open databases

Paul Walsh - August 10, 2017 in Labs, Open Data

This post has been co-authored by Adam Kariv, Vitor Baptista, and Paul Walsh.
Open Knowledge International (OKI) recently coordinated a two-day work sprint as a way to touch base with partners in the Open Data for Tax Justice project. Our initial writeup of the sprint can be found here. Phase I of the project ended in February 2017 with the publication of What Do They Pay?, a white paper that outlines the need for a public database on the tax contributions and economic activities of multinational companies. The overarching goal of the sprint was to start some work towards such a database, by replicating data collection processes we’ve used in other projects, and to provide a space for domain expert partners to potentially use this data for some exploratory investigative work. We had limited time, a limited budget, and we are pleased with the discussions and ideas that came out of the sprint. One attendee, Tim Davies, criticised the approach we took in the technical stream of the sprint. The problem with the criticism is the extrapolation of one stream of activity during a two-day event to posit an entire approach to a project. We think exploration and prototyping should be part of any healthy project, and that is exactly what we did with our technical work in the two-day sprint. Reflecting on the discussion presents a good opportunity here to look more generally at how we, as an organisation, bring technical capacity to projects such as Open Data for Tax Justice. Of course, we often bring much more than technical capacity to a project, and Open Data for Tax Justice is no different in that regard, being mostly a research project to date. In particular, we’ll take a look at the technical approach we used for the two-day sprint. While this is not the only approach towards technical projects we employ at OKI, it has proven useful on projects driven by the creation of new databases.

An approach

Almost all projects that OKI either leads on, or participates in, have multiple partners. OKI generally participates in one of three capacities (sometimes, all three):
  • Technical design and implementation of open data platforms and apps.
  • Research and thought leadership on openness and data.
  • Dissemination and facilitating participation, often by bringing the “open data community” to interact with domain specific actors.
Only the first capacity is strictly technical, but each capacity does, more often than not, touch on technical issues around open data. Some projects have an important component around the creation of new databases targeting a particular domain. Open Data for Tax Justice is one such project, as are OpenTrials, and the Subsidy Stories project, which itself is a part of OpenSpending. While most projects have partners, usually domain experts, it does not mean that collaboration is consistent or equally distributed over the project life cycle. There are many reasons for this to be the case, such as the strengths and weaknesses of our team and those of our partners, priorities identified in the field, and, of course, project scope and funding. With this as the backdrop for projects we engage in generally, we’ll focus for the rest of this post on aspects when we bring technical capacity to a project. As a team (the Product Team at OKI), we are currently iterating on an approach in such projects, based on the following concepts:
  • Replication and reuse
  • Data provenance and reproducibility
  • Centralise data, decentralise views
  • Data wrangling before data standards
While not applicable to all projects, we’ve found this approach useful when contributing to projects that involve building a database to, ultimately, unlock the potential to use data towards social change.

Replication and reuse

We highly value the replication of processes and the reuse of tooling across projects. Replication and reuse enables us to reduce technical costs, focus more on the domain at hand, and share knowledge on common patterns across open data projects. In terms of technical capacity, the Product Team is becoming quite effective at this, with a strong body of processes and tooling ready for use. This also means that each project enables us to iterate on such processes and tooling, integrating new learnings. Many of these learnings come from interactions with partners and users, and others come from working with data. In the recent Open Data for Tax Justice sprint, we invited various partners to share experiences working in this field and try a prototype we built to extract data from country-by-country reports to a central database. It was developed in about a week, thanks to the reuse of processes and tools from other projects and contexts. When our partners started looking into this database, they had questions that could only be answered by looking back to the original reports. They needed to check the footnotes and other context around the data, which weren’t available in the database yet. We’ve encountered similar use cases in both OpenBudgets.eu and OpenTrials, so we can build upon these experiences to iterate towards a reusable solution for the Open Data for Tax Justice project. By doing this enough times in different contexts, we’re able to solve common issues quickly, freeing more time to focus on the unique challenges each project brings.

Data provenance and reproducibility

We think that data provenance, and reproducibility of views on data, is absolutely essential to building databases with a long and useful future. What exactly is data provenance? A useful definition from wikipedia is “… (d)ata provenance documents the inputs, entities, systems, and processes that influence data of interest, in effect providing a historical record of the data and its origins”. Depending on the way provenance is implemented in a project, it can also be a powerful tool for reproducibility of the data. Most work around open data at present does not consider data provenance and reproducibility as an essential aspect of working with open data. We think this is to the detriment of the ecosystem’s broader goals of seeing open data drive social change: the credible use of data from projects with no provenance or reproducibility built in to the creation of databases is significantly diminished in our “post truth” era. Our current approach builds data provenance and reproducibility right into the heart of building a database. There is a clear, documented record of every action performed on data, from the extraction of source data, through to normalisation processes, and right to the creation of records in a database. The connection between source data and processed data is not lost, and, importantly, the entire data pipeline can be reproduced by others. We acknowledge that a clear constraint of this approach, in its current form, is that it is necessarily more technical than, say, ad hoc extraction and manipulation with spreadsheets and other consumer tools used in manual data extraction processes. However, as such approaches make data provenance and reproducibility harder because there is no history of the changes made or where the data comes from, we are willing to accept this more technical approach and iterate on ways to reduce technical barriers. We hope to see more actors in the open data ecosystem integrating provenance and reproducibility right into their data work. Without doing so, we greatly reduce the ability for open data to be used in an investigative capacity, and likewise, we diminish the possibility of using the outputs of open data projects in the wider establishment of facts about the world. Recent work on beneficial ownership data takes a step in this direction, leveraging the PROV-DM standard to declare data provenance facts.

Centralise data, decentralise views

In OpenSpending, OpenTrials, and our initial exploratory work on Open Data for Tax Justice, there is an overarching theme to how we have approached data work, user stories and use cases, and co-design with domain experts: “centralise data, decentralise views”. Building a central database for open data in a given domain affords ways of interacting with such data that are extremely difficult, or impossible, by actively choosing to decentralise such data. Centralised databases make investigative work that uses the data easier, and allows for the discovery, for example, of patterns across entities and time that can be very hard to discover if data is decentralised. Additionally, by having in place a strong approach to data provenance and reproducibility, the complete replication of a centralised database is relatively easily done, and very much encouraged. This somewhat mitigates a major concern with centralised databases, being that they imply some type of “vendor lock-in”. Views on data are better when decentralised. By “views on data” we refer to visualisations, apps, websites – any user-facing presentation of data. While having data centralised potentially enables richer views, data almost always needs to be presented with additional context, localised, framed in a particular narrative, or otherwise presented in unique ways that will never be best served from a central point. Further, decentralised usage of data provides a feedback mechanism for iteration on the central database. For example, providing commonly used contextual data, establishing clear use cases for enrichment and reconciliation of measures and dimensions in the data, and so on.

Data wrangling before data standards

As a team, we are interested in, engage with, and also author, open data standards. However, we are very wary of efforts to establish a data standard before working with large amounts of data that such a standard is supposed to represent. Data standards that are developed too early are bound to make untested assumptions about the world they seek to formalise (the data itself). There is a dilemma here of describing the world “as it is”, or, “as we would like it to be”. No doubt, a “standards first” approach is valid in some situations. Often, it seems, in the realm of policy. We do not consider such an approach flawed, but rather, one with its own pros and cons. We prefer to work with data, right from extraction and processing, through to user interaction, before working towards public standards, specifications, or any other type of formalisation of the data for a given domain. Our process generally follows this pattern:
  • Get to know available data and establish (with domain experts) initial use cases.
  • Attempt to map what we do not know (e.g.: data that is not yet publicly accessible), as this clearly impacts both usage of the data, and formalisation of a standard.
  • Start data work by prescribing the absolute minimum data specification to use the data (i.e.: meet some or all of the identified use cases).
  • Implement data infrastructure that makes it simple to ingest large amounts of data, and also to keep the data specification reactive to change.
  • Integrate data from a wide variety of sources, and, with partners and users, work on ways to improve participation / contribution of data.
  • Repeat the above steps towards a fairly stable specification for the data.
  • Consider extracting this specification into a data standard.
Throughout this entire process, there is a constant feedback loop with domain expert partners, as well as a range of users interested in the data.

Reflections

We want to be very clear that we do not think that the above approach is the only way to work towards a database in a data-driven project. Design (project design, technical design, interactive design, and so on) emerges from context. Design is also a sequence of choices, and each choice has an opportunity cost based on various constraints that are present in any activity. In projects we engage in around open databases, technology is a means to other, social ends. Collaboration around data is generally facilitated by technology, but we do not think the technological basis for this collaboration should be limited to existing consumer-facing tools, especially if such tools have hidden costs on the path to other important goals, like data provenance and reproducibility. Better tools and processes for collaboration will only emerge over time if we allow exploration and experimentation. We think it is important to understand general approaches to working with open data, and how they may manifest within a single project, or across a range of projects. Project work is not static, and definitely not reducible to snapshots of activity within a wider project life cycle. Certain approaches emphasise different ends. We’ve tried above to highlight some pros and cons of our approach, especially around data provenance and reproducibility, and data standards. In closing, we’d like to invite others interested in approaches to building open databases to engage in a broader discussion around these themes, as well as a discussion around short term and long term goals of such projects. From our perspective, we think there could be a great deal of value for the ecosystem around open data generally – CSOs, NGOs, governments, domain experts, funders – via a proactive discussion or series of posts with a multitude of voices. Join the discussion here if this is of interest to you.

Open Data Index 4ième édition – France, les données sur la transparence de l’action publique manquent à l’appel

pierre chrzanowski - July 3, 2017 in Open Data

À l’occasion des récentes publications de l’Open Data Index et de l’Open Data Barometer pour la période 2016-2017, nous analysons ci-dessous les principaux résultats pour la France.
 
Pour la période 2016-2017, la France se retrouve respectivement 3ème de l’Open Data Barometer et 4ième de l’Open Data Index. Ces résultats confirment les efforts de notre pays en matière de données ouvertes, et nous saluons le travail déjà accompli. Mais notre rôle est de regarder là où nous pouvons mieux faire, et malgré ces bons résultats, notre pays est encore loin d’être un exemple en matière d’open data dans des domaines comme la lutte contre la corruption ou l’intégrité publique, essentielles au rétablissement de la confiance dans l’action publique souhaitée par notre nouveau Gouvernement.

En France, un tiers des données clés sont disponibles en open data selon l’Open Data Index 2016.

Les données sur les dépenses publiques indisponibles

Malgré un plaidoyer constant de notre part et d’autres associations telles que Regards Citoyens et Transparency International, nous constatons que les détails des dépenses publiques (qui a dépensé, qui a reçu, quand, pour quel montant, pour quelle raison) ne sont toujours pas disponibles en open data. Seules les exécutions budgétaires agrégées par administration ou mission sont aujourd’hui disponibles, mais ces chiffres consolidés ne disent pas grand chose de l’activité réelle et détaillée d’une administration et ne permettent certainement pas au contribuable de contrôler où va son argent, et de savoir qui en est responsable. Alors que la moralisation de la vie publique semble de nouveau être une priorité de nos élus, nous attendons encore des engagements précis sur la question de la transparence des dépenses publiques, comme par exemple la mise en open data des données de dépense public géré au sein du système intégré de gestion CHORUS.

Les données sur les marchés publics incomplètes

Les données disponibles en open data sur les attributions des marchés publics en France ne concernent actuellement que les marchés passés par les administrations centrales. Les marchés publics passés avec les autres structures dont les collectivités territoriales (à l’exception de ceux relevant de l’obligation de publication au niveau européen) ne sont toujours pas disponible en open data. Au total, et en l’absence de la mise en œuvre des données ouvertes sur les marchés publics pour l’ensemble des acteurs, plus de la moitié du montant de la commande publique reste opaque en France, soit environ 100 milliards de dépenses, ou encore 5% du PIB de notre pays.
 
Mais la situation est censée évoluer. Selon un décret de mars 2016 (Article 107), tous les acheteurs ont désormais obligation de publier les données essentielles sur les marchés publics. Ils ont jusqu’au 1er octobre 2018 pour se conformer à cette disposition qui ne concernera cependant qu’un nombre limité d’information sur les passations des marchés au regard des bonnes pratiques internationales. En efftet, la France s’était engagée en décembre 2016 au côté du Royaume-Uni, de la Colombie, de l’Ukraine et du Mexique à mettre en œuvre un certain nombre de mesures pour améliorer la performance et l’intégrité des marchés publics. Parmi ces mesures, on trouvait l’adoption du Standard de Données sur la Commande Publique Ouverte (en anglais Open Contracting Data Standard — OCDS) un standard lancé en 2014 par l’Open Contracting Partnership qui vise à améliorer l’efficacité de la commande publique et à mieux détecter les cas de fraudes et de corruption. L’arrêté du 14 avril 2017 relatif aux données essentielles dans la commande publique est malheureusement en deça des principales exigences de l’OCDS. Par exemple, il ne permettra pas d’accéder aux données sur les contrats eux mêmes.

La DGFiP se refuse à ouvrir les données sur le cadastre

Le service public de la donnée, créé par la loi pour une République numérique, visait à mettre à disposition les jeux de données de référence qui présentent le plus fort impact économique et social pour le pays. Le plan cadastral fait partie de ces données clés, mais, à la différence de la plupart des autres données listées dans le décret d’application, n’a toujours pas été mis à disposition en open data par son producteur, la Direction Générale des Finances Publiques (DGFiP). À l’heure de la République numérique, l’accès en open data à ces données est désormais un droit pour le citoyen, et nous attendons donc de la DGFiP qu’elle s’y conforme rapidement en publiant le plan cadastral informatisée dans une licence et dans un format ouverts.

Seul un tiers des données clés disponibles en open data

Pour résumer, force est de constater qu‘en France les jeux de données les plus importants en matière de lutte contre la corruption ou pour le rétablissement de la confiance dans l’action publique sont ceux qui sont le plus difficiles à ouvrir. Notons également que d‘autres jeux de données essentiels posent encore problème. C’est le cas des données sur le registre des adresses, sur les textes de loi et sur les projets de loi, dont les régimes de licence disponibles incluent des clauses les rendant incompatibles avec le régime général des licences open data. Au total, selon l’Open Data Index, seulement un tiers des données clés sont évaluées comme étant totalement ouvertes.
Retrouver les résultats détaillés de l’Open Data Index  et de l’Open Data Barometer

Open Data Index et Open Data Barometer : en savoir plus

L’Open Data Index, mené par Open Knowledge International, est une évaluation annuelle des données ouvertes basée, pour cette édition, sur l’analyse de 15 jeux de données clés dans 94 pays ou territoires. La disponibilité des données est évaluée par des contributeurs bénévoles dans les différents pays et les résultats sont ensuite vérifiés et validés par des experts sélectionnés par Open Knowledge International. Notre groupe local Open Knowledge France participe chaque année à cet effort collectif. Les résultats présentés ci-dessous porte sur la 4ième édition de l’Open Data Index qui concerne la situation en 2016.
 
L’Open Data Barometer, mené par la Web Foundation, est une évaluation annuelle des données ouvertes basée également sur l’analyse de 15 jeux de données clés mais l’évaluation prend également en compte le niveau de préparation du pays à l’ouverture des données (droit d’accès à l’information, existence d’une initiative des données ouvertes au niveau national, etc.) ainsi que l’impact des données ouvertes dans un certain nombre de secteurs dont les services publics, la lutte contre la corruption, l’économie ou l’environnement. L’Open Data Barometer fait appel à des chercheurs rémunérés pour conduire l’évaluation et invite également les Gouvernements à faire leur auto-évaluation. Pour cette édition 115 pays ont été étudiés. L’Open Data Barometer propose également une évaluation de la situation en 2016, pour la 4ième année consécutive.
 
L’Open Data Index et l’Open Data Barometer considèrent l’Open Definition comme définition de référence pour les données ouvertes et utilisent quasiment les mêmes critères pour établir si un jeu de données est ouvert ou non. 
En revanche, les deux évaluations diffèrent sur les jeux de données considérés comme essentiels et leur définition. 
Liste des 15 jeux de données évalués par l’Open Data Barometer :
  • Cartes (cartes politique et topographique)
  • Foncier (cadastre)
  • Statistiques (statistiques démographiques et économiques)
  • Budget (budget prévisionnel)
  • Dépenses (dépenses publiques détaillées)
  • Entreprises (registre des entreprises)
  • Législation (textes de lois)
  • Transport (horaires)
  • Commerce (statistiques commerce international)
  • Santé (performance des services de santé)
  • Éducation (performance du système scolaire)
  • Criminalité (statistiques sur la criminalité)
  • Environnement (émissions de C02, pollution de l’air, déforestation, et qualité de l’eau)
  • Élections (résultats détaillés)
  • Contrats (détail des contrats des marchés publics)
Liste des 15 jeux de données évalués par l’Open Data Index :
  • Budget (budget prévisionnel)
  • Dépenses (dépenses publiques détaillée, au niveau de la transaction)
  • Marchés publics (notification et avis des marchés publics)
  • Résultats des élections (résultats détaillés)
  • Registre des entreprises 
  • Cadastre (propriété foncière)
  • Carte (carte géographique)
  • Prévisions météorologiques (prévisions à 3 jours)
  • Frontières administratives (contours géographique des différents niveau administratifs)
  • Adresses (registre géolocalisé des adresses)
  • Statistiques nationales (statistiques démographiques et économiques)
  • Propositions de lois (textes de lois en discussion au parlement) 
  • Textes de lois (textes de lois votés en application)
  • Qualité de l’air (dont particules fines PM et monoxye de carbone)
  • Qualité de l’eau (nitrates, coliformes fécaux, etc.)
Pour en savoir plus sur la méthodologie d’évaluation de l’Open Data Index https://index.okfn.org/methodology/
 
Pour en savoir plus sur la méthodologie d’évaluation de l’Open Data Barometer https://index.okfn.org/methodology/
 
Nous contacter: contact@okfn.fr

Open Data Index 4ième édition – France, les données sur la transparence de l’action publique manquent à l’appel

pierre chrzanowski - July 3, 2017 in Open Data

À l’occasion des récentes publications de l’Open Data Index et de l’Open Data Barometer pour la période 2016-2017, nous analysons ci-dessous les principaux résultats pour la France.
 
Pour la période 2016-2017, la France se retrouve respectivement 3ème de l’Open Data Barometer et 4ième de l’Open Data Index. Ces résultats confirment les efforts de notre pays en matière de données ouvertes, et nous saluons le travail déjà accompli. Mais notre rôle est de regarder là où nous pouvons mieux faire, et malgré ces bons résultats, notre pays est encore loin d’être un exemple en matière d’open data dans des domaines comme la lutte contre la corruption ou l’intégrité publique, essentielles au rétablissement de la confiance dans l’action publique souhaitée par notre nouveau Gouvernement.

En France, un tiers des données clés sont disponibles en open data selon l’Open Data Index 2016.

Les données sur les dépenses publiques indisponibles

Malgré un plaidoyer constant de notre part et d’autres associations telles que Regards Citoyens et Transparency International, nous constatons que les détails des dépenses publiques (qui a dépensé, qui a reçu, quand, pour quel montant, pour quelle raison) ne sont toujours pas disponibles en open data. Seules les exécutions budgétaires agrégées par administration ou mission sont aujourd’hui disponibles, mais ces chiffres consolidés ne disent pas grand chose de l’activité réelle et détaillée d’une administration et ne permettent certainement pas au contribuable de contrôler où va son argent, et de savoir qui en est responsable. Alors que la moralisation de la vie publique semble de nouveau être une priorité de nos élus, nous attendons encore des engagements précis sur la question de la transparence des dépenses publiques, comme par exemple la mise en open data des données de dépense public géré au sein du système intégré de gestion CHORUS.

Les données sur les marchés publics incomplètes

Les données disponibles en open data sur les attributions des marchés publics en France ne concernent actuellement que les marchés passés par les administrations centrales. Les marchés publics passés avec les autres structures dont les collectivités territoriales (à l’exception de ceux relevant de l’obligation de publication au niveau européen) ne sont toujours pas disponible en open data. Au total, et en l’absence de la mise en œuvre des données ouvertes sur les marchés publics pour l’ensemble des acteurs, plus de la moitié du montant de la commande publique reste opaque en France, soit environ 100 milliards de dépenses, ou encore 5% du PIB de notre pays.
 
Mais la situation est censée évoluer. Selon un décret de mars 2016 (Article 107), tous les acheteurs ont désormais obligation de publier les données essentielles sur les marchés publics. Ils ont jusqu’au 1er octobre 2018 pour se conformer à cette disposition qui ne concernera cependant qu’un nombre limité d’information sur les passations des marchés au regard des bonnes pratiques internationales. En efftet, la France s’était engagée en décembre 2016 au côté du Royaume-Uni, de la Colombie, de l’Ukraine et du Mexique à mettre en œuvre un certain nombre de mesures pour améliorer la performance et l’intégrité des marchés publics. Parmi ces mesures, on trouvait l’adoption du Standard de Données sur la Commande Publique Ouverte (en anglais Open Contracting Data Standard — OCDS) un standard lancé en 2014 par l’Open Contracting Partnership qui vise à améliorer l’efficacité de la commande publique et à mieux détecter les cas de fraudes et de corruption. L’arrêté du 14 avril 2017 relatif aux données essentielles dans la commande publique est malheureusement en deça des principales exigences de l’OCDS. Par exemple, il ne permettra pas d’accéder aux données sur les contrats eux mêmes.

La DGFiP se refuse à ouvrir les données sur le cadastre

Le service public de la donnée, créé par la loi pour une République numérique, visait à mettre à disposition les jeux de données de référence qui présentent le plus fort impact économique et social pour le pays. Le plan cadastral fait partie de ces données clés, mais, à la différence de la plupart des autres données listées dans le décret d’application, n’a toujours pas été mis à disposition en open data par son producteur, la Direction Générale des Finances Publiques (DGFiP). À l’heure de la République numérique, l’accès en open data à ces données est désormais un droit pour le citoyen, et nous attendons donc de la DGFiP qu’elle s’y conforme rapidement en publiant le plan cadastral informatisée dans une licence et dans un format ouverts.

Seul un tiers des données clés disponibles en open data

Pour résumer, force est de constater qu‘en France les jeux de données les plus importants en matière de lutte contre la corruption ou pour le rétablissement de la confiance dans l’action publique sont ceux qui sont le plus difficiles à ouvrir. Notons également que d‘autres jeux de données essentiels posent encore problème. C’est le cas des données sur le registre des adresses, sur les textes de loi et sur les projets de loi, dont les régimes de licence disponibles incluent des clauses les rendant incompatibles avec le régime général des licences open data. Au total, selon l’Open Data Index, seulement un tiers des données clés sont évaluées comme étant totalement ouvertes.
Retrouver les résultats détaillés de l’Open Data Index  et de l’Open Data Barometer

Open Data Index et Open Data Barometer : en savoir plus

L’Open Data Index, mené par Open Knowledge International, est une évaluation annuelle des données ouvertes basée, pour cette édition, sur l’analyse de 15 jeux de données clés dans 94 pays ou territoires. La disponibilité des données est évaluée par des contributeurs bénévoles dans les différents pays et les résultats sont ensuite vérifiés et validés par des experts sélectionnés par Open Knowledge International. Notre groupe local Open Knowledge France participe chaque année à cet effort collectif. Les résultats présentés ci-dessous porte sur la 4ième édition de l’Open Data Index qui concerne la situation en 2016.
 
L’Open Data Barometer, mené par la Web Foundation, est une évaluation annuelle des données ouvertes basée également sur l’analyse de 15 jeux de données clés mais l’évaluation prend également en compte le niveau de préparation du pays à l’ouverture des données (droit d’accès à l’information, existence d’une initiative des données ouvertes au niveau national, etc.) ainsi que l’impact des données ouvertes dans un certain nombre de secteurs dont les services publics, la lutte contre la corruption, l’économie ou l’environnement. L’Open Data Barometer fait appel à des chercheurs rémunérés pour conduire l’évaluation et invite également les Gouvernements à faire leur auto-évaluation. Pour cette édition 115 pays ont été étudiés. L’Open Data Barometer propose également une évaluation de la situation en 2016, pour la 4ième année consécutive.
 
L’Open Data Index et l’Open Data Barometer considèrent l’Open Definition comme définition de référence pour les données ouvertes et utilisent quasiment les mêmes critères pour établir si un jeu de données est ouvert ou non. 
En revanche, les deux évaluations diffèrent sur les jeux de données considérés comme essentiels et leur définition. 
Liste des 15 jeux de données évalués par l’Open Data Barometer :
  • Cartes (cartes politique et topographique)
  • Foncier (cadastre)
  • Statistiques (statistiques démographiques et économiques)
  • Budget (budget prévisionnel)
  • Dépenses (dépenses publiques détaillées)
  • Entreprises (registre des entreprises)
  • Législation (textes de lois)
  • Transport (horaires)
  • Commerce (statistiques commerce international)
  • Santé (performance des services de santé)
  • Éducation (performance du système scolaire)
  • Criminalité (statistiques sur la criminalité)
  • Environnement (émissions de C02, pollution de l’air, déforestation, et qualité de l’eau)
  • Élections (résultats détaillés)
  • Contrats (détail des contrats des marchés publics)
Liste des 15 jeux de données évalués par l’Open Data Index :
  • Budget (budget prévisionnel)
  • Dépenses (dépenses publiques détaillée, au niveau de la transaction)
  • Marchés publics (notification et avis des marchés publics)
  • Résultats des élections (résultats détaillés)
  • Registre des entreprises 
  • Cadastre (propriété foncière)
  • Carte (carte géographique)
  • Prévisions météorologiques (prévisions à 3 jours)
  • Frontières administratives (contours géographique des différents niveau administratifs)
  • Adresses (registre géolocalisé des adresses)
  • Statistiques nationales (statistiques démographiques et économiques)
  • Propositions de lois (textes de lois en discussion au parlement) 
  • Textes de lois (textes de lois votés en application)
  • Qualité de l’air (dont particules fines PM et monoxye de carbone)
  • Qualité de l’eau (nitrates, coliformes fécaux, etc.)
Pour en savoir plus sur la méthodologie d’évaluation de l’Open Data Index https://index.okfn.org/methodology/
 
Pour en savoir plus sur la méthodologie d’évaluation de l’Open Data Barometer https://index.okfn.org/methodology/
 
Nous contacter: contact@okfn.fr

Hackathon al Gran Sasso Science Institute! Iscrivetevi, avete tempo fino al 1 Luglio!

Francesca De Chiara - June 23, 2017 in civic tech, Events, Open Data

In occasione del Festival della Partecipazione, il Gran Sasso Science Institute (GSSI) organizza il 7 e 8 luglio prossimi a L’Aquila un Hackathon per sviluppare progetti di prodotti, di servizi o rappresentazioni visuali utili, sostenibili e replicabili, in grado di generare un impatto significativo nei modi di pensare, vivere e condividere la ricostruzione e le future […]

Open Government Partnership: andare oltre l’agenda

Francesca De Chiara - June 13, 2017 in Open Data

Pubblichiamo un post che appare simultaneamente su altre testate e prodotto in condivisione con membri della società civile italiana impegnati nel processo di consultazione e partecipazione avviato da OGP Italia. Nel quadro del processo proposto dal Dipartimento della Funzione Pubblica (ministro Madia – Presidenza del Consiglio dei Ministri) alle organizzazioni della società civile per definire […]

Impact Series: Improving Data Collection Capacity in Non-Technical Organisations

David Selassie Opoku - June 5, 2017 in OD4D, Open Data

Open Knowledge International is a member of Open Data for Development (OD4D), a global network of leaders in the open data community, working together to develop open data solutions around the world. In this blog, David Opoku of Open Knowledge International talks about how the OD4D programme’s Africa Open Data Collaboration Fund and  Embedded Fellowships are helping build the capacity of civil society organisations (CSOs) in Africa to explore the challenges and opportunities of becoming alternative public data producers.

Nana Baah Gyan was an embedded fellow who worked with Advocates for Community Alternatives (ACA) in Ghana to help with their data needs.

Context 

Due to the challenge of governments providing open data in Africa, civil society organisations (CSOs) have begun to emerge as alternative data producers. The value these CSOs bring includes familiarity of the local context or specific domain where data may be of benefit.  In some cases, this new role for CSOs serves to provide additional checks and verification for data that is already available, and in others to provide entire sets of data where none exists. CSOs now face the challenge of building their own skills to effectively produce public data that will benefit its users. For most CSOs in low-income areas, building this capacity can be long, logistically-intensive, and expensive.

Figure 1: CSOs are evolving from traditional roles as just data intermediaries to include producers of data for public use.

Through the Open Data for Development (OD4D) program, Open Knowledge International (OKI) sought to learn more about what it takes to enable CSOs to become capable data collectors. Using the Africa Open Data Collaboration (AODC) Fund and the OD4D embedded fellowship programmes, we have been exploring the challenges and opportunities for CSO capacity development to collect relevant data for their work.

Our Solution

The AODC Fund provided funding ($15000 USD) and technical support to the Women Environmental Programme (WEP) team in Abuja, Nigeria, that was working on a data collection project aimed at transparency and accountability in infrastructure and services for local communities. WEP was supported through the AODC Fund in learning how to design the entire data collection process, including recruiting and training the data collectors, selecting the best data collection tool, analysing and publishing the findings, and documenting the entire process.

Figure 2: Flowchart of a data collection process. Data collection usually requires several components or stages that make it challenging for non-technical CSOs to easily implement without the necessary skills and resources.

In addition, the embedded fellowship programme allowed us to place a data expert in the Advocates for Community Alternatives (ACA) team for 3 months to build their data collection skills. ACA, which works on land issues in Ghana, has been collecting data on various community members and their land. Their challenge was building an efficient system for data collection, analysis and use. The data expert has been working with them to design and test this system and train ACA staff members in using it.

Emerging Outcomes

Through this project, there has been an increased desire within both WEP and ACA to educate their staff members about open data and its value in advocacy work. Both organisations have learned the value of data and now understand the need to develop an organisational data strategy. This is coupled with an acknowledgement of the need to strengthen organisational infrastructure capacity (such as better emailing systems, data storage, etc.) to support this work. The hope is that both organisations will have greater knowledge going forward on the importance of data, and have gained new skills in how to apply it in practice. WEP, for instance, has since collected and published their dataset from their project and are now making use of the Kobo Toolbox along with other newly acquired skills in their new projects. ACA, on the other hand, is training more of its staff members with the Kobo Toolbox manual that was developed, and are exploring other channels to build internal data capacity.

Lessons

These two experiences have shed some more light on the growing needs of CSOs to build their data collection capacity. However, the extent of the process as depicted in Figure 1 shows that more resources need to be developed to enhance the learning and training of CSOs. A great example of a beneficial resource is the School of Data’s  Easy Guide to Mobile Data Collection. This resource has been crucial in providing a holistic view of data collection processes to interested CSOs. Another example is the development of tools such as the Kobo Toolbox, which has simplified a lot of the technical challenges that would have been present for non-technical and low-income data collectors.

Figure 3: CSO-led data collection projects should be collaborative efforts with other data stakeholders.

We are also learning that it is crucial to foster collaborations with other data stakeholders in a CSO-led data collection exercise. Such stakeholders could include working with academic institutions in methodology research and design,  national statistics offices for data verification and authorisation, civic tech hubs for technical support and equipment, telecommunication companies for internet support, and other CSOs for contextualised experiences in data collection. Learn more about this project:

Open Knowledge Belgium is preparing for open Summer of code 2017

driesvr - May 31, 2017 in belgium, Civic Labs, Events, General, Open Belgium, Open Data, Open Knowledge, open Summer of code, oSoc17

In the last few months, the open community in Belgium has had the chance to gather multiple times. Open Knowledge Belgium organised a couple of events and activities which aimed to bring its passionate community together and facilitate the launch of new projects. Furthermore, as summertime is coming, it’s currently organising the seventh edition of its yearly open Summer of code. Let’s go chronologically through what’s going on at Open Knowledge Belgium.

Open Belgium 2017

As the tradition goes, the first Monday after International Open Data Day, Open Knowledge Belgium organises its Open Belgium conference on open knowledge and open data in Belgium.

Open Belgium was made possible by an incredible group of volunteers

This year’s community-driven gathering of open enthusiasts took place in Brussels for the first time and was a big success. More than 250 people with different backgrounds showed up to talk about the current state of and next steps towards more open knowledge and open data in Belgium.

All presentations, notes and visuals of Open Belgium are available on http://2017.openbelgium.be/presentations.

Launch of Civic Lab Brussels

It all started during a fruitful discussion with Open Knowledge Germany at Open Belgium. While talking about the 26 OK Labs in Germany, more specifically being intrigued by the air quality project of OK Lab Stuttgart, we got to ask ourselves: why wouldn’t we launch something similar in Brussels/Belgium?

In about the same period of time, some new open initiatives popped up from within our community and several volunteers repeatedly expressed their interest to contribute to Open Knowledge’s mission of building a world in which knowledge creates power for the many, not the few.

Eventually, after a wonderful visit to BeCentral — the new digital hub above Brussels’ central station — all pieces of the puzzle got merged into the idea of a Civic Lab: bringing volunteers and open projects every 2 weeks together in an open space.

The goal of Civic Labs Brussels is two-fold: on the one hand, offering volunteers opportunities to contribute to civic projects they care about. On the other hand, providing initiative-takers of open project with help and advice from fellow citizens.

Open in the case of our Civic Lab means, corresponding to the Open Definition, yet slightly shorter, that anyone can freely contribute to and benefit from the project. No strings attached.

Civic Lab meetups are not only to put open initiatives in the picture and hang out with other civic innovators. They’re also about getting things done and creating impact. Therefore, those gatherings always take place under the same format of short introductory presentations (30 min) — to both new and ongoing projects — followed by action (2 hours), whereby all attendees are totally free to contribute to the project of their choice and can come up with new projects.

Open Summer of code 2017

Last but not least, Open Knowledge Belgium is preparing for the seventh edition of its annual open Summer of code. From 3rd until 27th July, 36 programming, design and communications students will be working under the guidance of experienced coaches on 10 different open innovation projects with real-life impact.

If you want to stay updated about open Summer of code and all other activities, please follow Open Knowledge Belgium on Twitter or subscribe to its newsletter.

Open data quality – the next shift in open data?

Open Knowledge International - May 31, 2017 in Data Quality, Global Open Data Index, GODI16, Open Data

This blog post is part of our Global Open Data Index blog series. It is a call to recalibrate our attention to the many different elements contributing to the ‘good quality’ of open data, the trade-offs between them and how they support data usability (see here some vital work by the World Wide Web Consortium). Focusing on these elements could help support governments to publish data that can be easily used. The blog post was jointly written by Danny Lämmerhirt and Mor Rubinstein.   Some years ago, open data was heralded to unlock information to the public that would otherwise remain closed. In the pre-digital age, information was locked away, and an array of mechanisms was necessary to bridge the knowledge gap between institutions and people. So when the open data movement demanded “Openness By Default”, many data publishers followed the call by releasing vast amounts of data in its existing form to bridge that gap. To date, it seems that opening this data has not reduced but rather shifted and multiplied the barriers to the use of data, as Open Knowledge International’s research around the Global Open Data Index (GODI) 2016/17 shows. Together with data experts and a network of volunteers, our team searched, accessed, and verified more than 1400 government datasets around the world. We found that data is often stored in many different places on the web, sometimes split across documents, or hidden many pages deep on a website. Often data comes in various access modalities. It can be presented in various forms and file formats, sometimes using uncommon signs or codes that are in the worst case only understandable to their producer. As the Open Data Handbook states, these emerging open data infrastructures resemble the myth of the ‘Tower of Babel’: more information is produced, but it is encoded in different languages and forms, preventing data publishers and their publics from communicating with one another. What makes data usable under these circumstances? How can we close the information chain loop? The short answer: by providing ‘good quality’ open data.  

Understanding data quality – from quality to qualities

The open data community needs to shift focus from mass data publication towards an understanding of good data quality. Yet, there is no shared definition what constitutes ‘good’ data quality. Research shows that there are many different interpretations and ways of measuring data quality. They include data interpretability, data accuracy, timeliness of publication, reliability, trustworthiness, accessibility, discoverability, processability, or completeness.  Since people use data for different purposes, certain data qualities matter more to a user group than others. Some of these areas are covered by the Open Data Charter, but the Charter does not explicitly name them as ‘qualities’ which sum up to high quality. Current quality indicators are not complete – and miss the opportunity to highlight quality trade-offs Also, existing indicators assess data quality very differently, potentially framing our language and thinking of data quality in opposite ways. Examples are: Some indicators focus on the content of data portals (number of published datasets) or access to data. A small fraction focus on datasets, their content, structure, understandability, or processability. Even GODI and the Open Data Barometer from the World Wide Web Foundation do not share a common definition of data quality.
 Arguably, the diversity of existing quality indicators prevents from a targeted and strategic approach to improving data quality.

At the moment GODI sets out the following indicators for measuring data quality:
  • Completeness of dataset content
  • Accessibility (access-controlled or public access?)
  • Findability of data
  • Processability (machine-readability and amount of effort needed to use data)
  • Timely publication
This leaves out other qualities. We could ask if data is actually understandable by people. For example, is there a description what each part of the data content means (metadata)?   Improving quality by improving the way data is produced Many data quality metrics are (rightfully so) user-focussed. However, it is critical that government as data producers better understand, monitor and improves the inherent quality of the data they produce. Measuring data quality can incentivise governments to design data for impact: by raising awareness of the quality issues that would make data files otherwise practically impossible to use. At Open Knowledge International, we target data producers and the quality issues of data files mostly via the Frictionless Data project. Notable projects include the Data Quality Spec which defines some essential quality aspects for tabular data files. GoodTables provides structural and schema validation of government data, and the Data Quality Dashboard enables open data stakeholders to see data quality metrics for entire data collections “at a glance”, including the amount of errors in a data file. These tools help to develop a more systematic assessment of the technical processability and usability of data.

A call for joint work towards better data quality

We are aware that good data quality requires solutions jointly working together. Therefore, we would love to hear your feedback. What are your experiences with open data quality? Which quality issues hinder you from using open data? How do you define these data qualities? What could the GODI team improve?  Please let us know by joining the conversation about GODI on our forum.

Open Data Index in Brazil launched! by FGV and Open Knowledge Brazil

Open Knowledge Brazil - May 25, 2017 in network, Open Data, Open Data Index

Open Knowledge Brazil and Fundação Getúlio Vargas (FGV) – a higher education institution in Brazil worked together to develop the Brazilian edition of the Open Data Index, which is being used by governments as a tool to enhance public management, and bring it even closer to Brazil’s reality. 

About the Open Data Index

The Brazilian edition of the Open Data Index has been used as a tool to set priorities regarding transparency and open data policies, as well as a pressure mechanism used by civil society to encourage governments to enhance their performance, releasing sets of essential data. The indicator is based on data availability and accessibility across 13 key categories, including government spending, election results, public acquisitions, pollution levels, water quality data, land ownership, and climate data, among others. Submissions are peer reviewed and verified by a local team of data experts and reviewers. Points are assigned based on the conclusions reached through this process.

OK Brazil and FGV Partnership 

Through a series of events held in partnership with Open Knowledge Brazil (OKBR) and FGV’s Department of Public Policy Analysis (DAPP) launched the Brazilian edition of the Open Data Index (ODI) – a civil society initiative designed to assess the state of open government data worldwide. Three assessments were established for Brazil through a joint effort between the two institutions:
  1. Open Data Index (ODI) for Brazil, at the national level, 
  2. ODI Sao Paulo at the municipal level and
  3. ODI Rio de Janeiro, also at the municipal level
The last two are part of a pioneering initiative, since these are the first regional ODIs in Brazil, in addition to the nationwide assessment. 
This partnership with OKBr and the development of the Open Data Index complement DAPP’s life-long efforts in the areas of political and budget transparency, featuring widely recognised tools such as the Budget Mosaic and Transparent Chamber. We believe that public debate can only be qualified through data transparency, social engagement and dialogue within network society –  Marco Aurelio Ruediger, director of DAPP

The two institutions are working to develop the indicator used by governments across 122 countries as a tool to enhance public management and bring it even closer to Brazil’s reality. The goal is for data disclosure to promote institutional development by encouraging transparency within the government’s foundations, achieved both through constant scrutiny by civil society and improvements implemented by administrators regarding the quality and access to information.
Among the practical results of this new effort for society is the possibility of using results to develop and monitor public policies regarding transparency and open data – Ariel Kogan, CEO of OKBR

Open Data Index for Brazil 

The Open Data Index for Brazil, launched on April 27 in Brasilia, revealed that the country is in 8th place in the world ranking, tied with the United States and Latvia, and it occupies the leadership among its neighbours in Latin America. In total, 15 dimensions related to themes such as public spending, environment and legislation were analysed. However, the overall score of 64% indicates that there is still a lot of room for improvement. Only six — or 40% — dimensions of the index received the total score, that is, they were considered totally open: Public Budget, Electoral Results, National Maps, Socioeconomic Statistics, Laws in Force and Legislative Activity. However, no public databases were found for three dimensions surveyed: Locations, Water Quality and Land Ownership.

Open Data Index for Cities – São Paulo

The ODI São Paulo, launched two days earlier, had a similar result. In the overall assessment, the municipality had a positive result in the index, with 75% of the total score. Within the index analysis dimensions, 7 of the 18 evaluated databases obtained a maximum score: this means that 38% of the databases for the city were considered fully open. On the other hand, the Land Ownership dimension was evaluated with 0%, due to the unavailability of data; and another four had a score lower than 50% (Business Register, Water Quality and Weather Forecast).

Open Data Index for Cities – Rio de Janeiro

The ODI Rio de Janeiro [report in Portuguese], released on May 4, showed a slightly different performance. The city of Rio de Janeiro had a high overall score, reaching 80%. The study indicates, however, that only five dimensions (Election Results, City Maps, Administrative Limits, Criminal Statistics and Public Schools) had the individual score of 100%, with only 27% of the databases being considered fully open. The incompleteness of the dataset appears six times, i.e. there is no availability of certain information which is considered essential. The issue of access restriction appears only in the Business Register dimension. The Land Ownership dimension is also considered critical, since there is no data available for carrying out the ODI assessment. In summary, it is believed that the information can be useful for an open data policy at the municipal and federal level, to provide the paths for the replication of good practices and the correction of points of attention. The benefits of an open data policy are innumerable and include the extension of management efficiency, the creation of an instrument for collecting results from public administration, promoting accountability and social control, engaging civil society with public management and improving the public image, with the potential of becoming an international reference