Steady but Slow – Open Data’s Progress in the Caribbean

Open Knowledge International - April 27, 2017 in Global Open Data Index, Open Data, Open Government Data, opendata

Over the last two years, the SlashRoots Foundation has supported the Caribbean’s participation in the Open Knowledge International’s Global Open Data Index, an annual survey which measures the state of  “open” government across the world. We recently completed the 2016 survey submissions and were asked to share our initial reactions before the full GODI study is released in May. In the Global Open Data Index, each country is assessed based on the availability of “open data” as defined in the Open Knowledge Foundation’s Open Data Definition across key thematic areas that Governments are expected to publish information on. These include: National Maps, National Laws, Government Budget, Government Spending, National Statistics, Administrative Boundaries, Procurement, Pollutant Emissions, Election Results, Weather Forecast, Water Quality, Locations, Draft Legislation, Company Register, and Land Ownership. For the 2016 survey, the Caribbean was represented by ten countries—Antigua & Barbuda, Barbados, Bahamas, Dominican Republic, Jamaica, Guyana, Trinidad and Tobago, St. Lucia, St. Kitts & Nevis, and St. Vincent & the Grenadines. As the Caribbean’s Regional Coordinator, we manage and source survey submissions from citizens, open data enthusiasts, and government representatives. These submissions then undergo a quality review process led by global experts. This exercise resulted in 150 surveys for the region and provided both an excellent snapshot of how open data in the Caribbean is progressing and how the region ranks in a global context. Unfortunately, progress in the Caribbean has been mixed, if not slow. While Caribbean governments were early adopters of Freedom of Information legislation–7 countries (Antigua and Barbuda, Belize, Dominican Republic, Guyana, Jamaica, St. Vincent and the Grenadines, Trinidad and Tobago) having passed FOI law–the digital channels through which many citizens are increasingly accessing government information remain underdeveloped. Furthermore, the publication of raw and baseline data, beyond references in press releases, remains a challenge across the region. For example, St. Kitts, which passed FOI legislature in 2006, only had 2 “open” data sets, Government Budget and Legislature, published readily online. Comparatively, Puerto Rico, the Dominican Republic and Jamaica governments have invested in open data infrastructure and websites to improve the channels through which citizens access information. Impressively, the Dominican Republic’s data portal consisted of 373 data sets from 71 participating Ministries, Departments and Agencies. However, updates to data portals and government websites remain a challenge. In the case of Jamaica’s open data portal, which launched in 2016, it has received a handful of updates since its first publication. While St Lucia and Trinidad & Tobago have published no updates since the first month of the portal’s publication. Despite these shortcomings, Caribbean governments and civil society organisations continue to make important contributions to the global open data discourse that demonstrate tangible benefits of open data adoption in the lives of Caribbean citizens. These range from research demonstrating the economic impact of open data to community-led initiatives helping to bridge the data gaps that constrain local government planning. In December 2016, Jamaica became the fourth country in the region, after Guyana, the Dominican Republic and Trinidad & Tobago, to indicate its interest in joining the Open Government Partnership, a multilateral initiative consisting of 73 member countries that aims to secure concrete commitments from governments to promote transparency, empower citizens, fight corruption, and harness new technologies to strengthen governance. Find out on how the Caribbean ranks in the full GODI report to be published on May 2nd.

The Elizabeths: Elemental Historians

Adam Green - April 26, 2017 in Uncategorized

CONJECTURES #4 — Carla Nappi conjures a dreamscape from four archival fragments — four oblique references to women named “Elizabeth” who lived on the watershed of the 16th-to-17th century.

OKI Agile: How to create and manage user stories

Tryggvi Björgvinsson - April 26, 2017 in agile, Our Work

This is the first in a series of blogs on how we are using the Agile methodology at Open Knowledge International. Originating from software development, the Agile manifesto describes a set of principles that prioritise agility in work processes: for example through continuous development, self-organised teams with frequent interactions and quick responses to change (http://agilemanifesto.org). In this blogging series we go into the different ways Agile can be used to work better in teams and to create more efficiency in how to deliver projects. The first blog is dedicated to user stories, a popular agile technique.  User stories are a pretty nifty way of gathering requirements in an agile environment, where one of the key values is responding to change over following a plan. They are a good anchor for conversation that can then take place at the right time.

What is a user story?

A user story is a short sentence that encapsulates three things:
  1. Who?
  2. What?
  3. Why?
Notice that this does not include “How?” The “How?” is left to the team delivering the requirement. After all, the team consists of the professionals. They know how to deliver the best solution.  The most common way to encapsulate a user story is to use the template:
  • As a [WHO] I want [WHAT] so that [WHY]
Be careful not to sneak in any Hows into that template. That usually happens in the What so stay focussed! Words like by, using or with should be avoided like the plague because they usually result in a HowBasically avoid anything that has to do with the actual implementation.

Bad user stories

    • As a government official I want a Range Rover so that I can get from A to B quickly
  • Problem: A Range Rover is an actual implementation, it might not be what is needed even though it is what’s believed to be desired.
      • As a visitor to a website I want to be able to read a landing page using my spiffy MacBook Air and have the content presented in the Lato typeface, size 14, and with good white space between paragraphs so that I can know what the website is all about
    • Problem: A whole lot! What about GNU/Linux and Windows users? What if there is a better typeface out there? What about language of the content? The Why isn’t really a why. The list goes on. Don’t go into detail. It’s bad practice and creates more problems than it solves.

    Good user stories

    • As a government official I want means of transportation so that I can get from A to B quickly
    • As a website visitor I want to know what the website is about so that I can see how it can help me

    Why shouldn’t we go into details?

    It’s really quite simple. We expect the requirements to change and we’d just waste a lot of time going into the details of something that might change or get thrown out. We’re trying to be efficient while still giving the team an understanding of the broader picture. An extreme example would be that between project start and time when the team is going to tackle a user story the world might have moved to virtual governments that don’t need transportation any more (technology moves fast). The team also consists of experts so they know what works best (if not, why are they tasked to deliver?). The customers are the domain experts so they know best what is needed. In the website visitor example above, the team would know the best way of showing what a website is about (could be a landing page) but the customer knows what the customer is going to offer through the website and how they help people. We also value interactions and individuals over processes and tools. In an ever changing requirements environment we want non-details which can when the time comes be the basis for a conversation about the actual implementation. The team familiarises itself with the requirement at the appropriate time. So when starting work on the transportation user story, the team might discuss with the customer and ask questions like:
    • “How fast is quickly?”,
    • “Are A and B in the same city, country, on Earth?”,
    • “Are there any policies we need to be aware of?” etc.

    Acceptance of user stories

    Surely the customer would still want to be able to have a say in how things get implemented. That’s where acceptance criteria comes in. The customer would create a checklist for each user story when the time comes in a joint meeting, based on discussion. That’s the key thing. It comes out of a discussion. This criteria tells the team in a bit more detail what they need to fulfill to deliver the requirement (user story). For the government in need of transport this might be things like:
    • Main area of interest/focus is London area
    • Applicable to/usable in other cities as well
    • Allows preparations for a meeting while in transit
    • Very predictable so travel time can be planned in detail
    • Doesn’t create a distance between me and the people I serve
    Then the implementation team might even pick public transportation to solve this requirement. A Range Rover wasn’t really needed in the end (albeit this would probably go against the “satisfy the customer” principle but hey! I’m teaching you about user stories here! Stay focussed!).

    How is this managed?

    One key thing we want to get out of user stories is to not scope the requirement in detail until it becomes clear that it’s definitely going to be implemented. How then do you know what you’ll be doing in the future? User stories can be of different sizes; From very coarse to detailed stories. The very coarse ones don’t even need to be written as user stories. They’re often referred to as epics. Many break requirements into three stages. The releases or the projects or whatever the team works on. Then each of these can be broken up into features and each feature can be broken up into tasks. It’s up to the team to decide when it’s best to formulate these as user stories and it really depends on the team and the project. Some might have epics as the big long term project, break that up into user stories, and then break each user story up into tasks. Others might have a single product, with the releases (what you want to achieve in each release: “The geospatial release”) at the top and then have features as sentences (epics) underneath the release and then transform the sentences into user stories you work on. Whatever way you do, this is the general guideline of granularity:
    • Coarsest: Long-term plans of what you’ll be doing
    • Mid-range: Delivery in a given time period (e.g. before deadlines)
    • Finest: What team will deliver in a day or two
    The reason the finest level is in a day or two is to give the team a sense of progress and avoid getting stuck at: “I’m still doing the guildhall thing” which is very demoralizing and inefficient (and not really helpful for others who might be able to help). There is a notion of the requirements iceberg or pyramid which tries to visualise the three stages. The bottom stage is larger and bigger items (the coarse stuff), mid range is what you’re delivering in a time period, and the finest is the smallest blocks of work. That’s what’s going to be “above” surface for the core team. That’s still just a fraction of the big picture.


    When should who be involved?

    So the core team has to decide at what stage of the iceberg they want to write the user stories, and that kind of depends on the project, the customer, and the customer’s involvement. So we need to better understand “the team”. The core team should always be present and work together. Who is in the core team then? If that’s not clear, there’s a story/joke, about the pig and the chicken, that can guide us: A pig and a chicken decided to open up a restaurant. They were discussing what name to give the restaurant when the chicken proposed the name: Ham & Eggs. The pig sneered its nose and said: “That’s unfair, I’d be committed but you’d only be involved!” That’s the critical distinction between the core team and others. The core team is the pigs. Everyone else who is only involved to make the project happen is a chicken. The pigs run the show. The chickens are there to make sure the pigs can deliver. Chickens come in various sizes and shapes. It can be team managers (planning persons), unit heads, project managers, biz-dev people, and even customers. The term customer is pretty vague. You usually don’t have all your customers involved. Usually you only have a single representative. For bespoke/custom development (work done at the request of someone else), that person is usually the contact person for the client you’re working for. At other times the single customer representative is an internal person. That internal individual is sometimes referred to as the product owner (comes from Scrum) and is a dedicated role put in place when there is no single customer, e.g. the product is being developed in-house. That person then represents all customers and has in-depth knowledge about all customers or has access to a focus group or something. This individual representative is the contact point for the team. The one who’s there for the team to help them deliver the right thing. More specifically this individual:
    • Creates initial user stories (and drives creation of other user stories)
    • Helps the team prioritise requirements (user stories)
    • Accepts stories (or rejects) when the team delivers
    • Is available to answer any questions the team might have
    So the representative’s role is to provide the implementers with enough domain knowledge to proceed and deliver the right thing. This individual should not have any say in how the core team will implement it. That’s why the team was hired/tasked with delivering it, because they know how to do it. That’s also why user stories do not focus on the how. The core team, the pigs, need to decide at what intersections in the iceberg they want to have this representative present (where discussions between the core team and the representative will happen). When they go from coarsest to mid-range or from mid-range to finest. So in a weird sense, the core team decides when the customer representative decides what will be done. As a rule of thumb: the user stories feed into the stage above the intersection where representative is present. So if the representative helps the team go from coarse to mid-range, the user stories are created for the mid-range stage. If the representative is there for mid-range to finest, the user stories are going to be very fine-grained. As a side note, because the chickens are there to make sure the pigs can deliver, they will always have to be available to answer questions. Many have picked up the standup activity from the Scrum process to discuss blockers, and in those cases it’s important that everyone involved, both pigs and chickens, is there so the chickens can act quickly to unblock the pigs. Now go and have fun with user stories. They shouldn’t be a burden. They should make your life easier… or at least help you talk to chickens.

    Texts in Mathias Enard’s Compass

    Adam Green - April 26, 2017 in Balzac, compass, Joseph-Charles Mardrus, Mathias Enard, orient, orientalism, Thousand And One Nights, Victor Hugo

    Collection of the major public domain texts featuring in the novel Compass by French writer Mathias Enard — including Balzac, Victor Hugo, and Joseph-Charles Mardrus.

    Join The Story Hunt – Uncover the EU!

    Bela Seeger - April 26, 2017 in Uncategorized

    'The Story Hunt: Uncover the EU' is a programme hosted by the Open Knowledge Foundation Germany's teams of Datenschule, OpenBudgets, and SubsidyStories.eu. Together with journalists, analysts, non-profit organizations, developers and designers, we want to develop and apply the skills needed to hunt stories in financial data. The Story Hunt is split into two separate parts: a series of Workshops and an Expedition Weekend in Berlin. The workshops are tailored to aspiring data journalists and non-profit organizations that are interested in improving their data-literacy skills. They are run by our team and by trusted members of the civic tech community in Germany.
    They will culminate in the weekend-expedition at end of June, where - together with proven experts - we are going to dive into a massive database of the European Union’s primary financial instrument, the ESIFunds.

    During the weekend, we will form interdisciplinary teams that collaboratively work on finding stories, leads, and data analyses around the EU and its money flows. This will offer an ideal opportunity to practice the acquired skills in a supportive environment.

    Find out more about the programme on the website!

    Interested? Join us!

    Join The Story Hunt – Uncover the EU!

    Bela Seeger - April 26, 2017 in Uncategorized

    'The Story Hunt: Uncover the EU' is a programme hosted by the Open Knowledge Foundation Germany's teams of Datenschule, OpenBudgets, and SubsidyStories.eu. Together with journalists, analysts, non-profit organizations, developers and designers, we want to develop and apply the skills needed to hunt stories in financial data. The Story Hunt is split into two separate parts: a series of Workshops and an Expedition Weekend in Berlin. The workshops are tailored to aspiring data journalists and non-profit organizations that are interested in improving their data-literacy skills. They are run by our team and by trusted members of the civic tech community in Germany.
    They will culminate in the weekend-expedition at end of June, where - together with proven experts - we are going to dive into a massive database of the European Union’s primary financial instrument, the ESIFunds.

    During the weekend, we will form interdisciplinary teams that collaboratively work on finding stories, leads, and data analyses around the EU and its money flows. This will offer an ideal opportunity to practice the acquired skills in a supportive environment.

    Find out more about the programme on the website!

    Interested? Join us!

    Índice mostra situação de dados abertos e transparência em São Paulo

    Elza Maria Albuquerque - April 25, 2017 in Dados Abertos, Destaque, índice de dados abertos, Open Data Index

    Sonnig bis regnerisch: Unsere Stellungnahme zum Wetterdienst-Gesetz

    Arne Semsrott - April 24, 2017 in Uncategorized

    Der Entwurf der Bundesregierung zur Änderung des Wetterdienst-Gesetzes bietet nicht nur die Möglichkeit für Wetter-Wortspiele, sondern ist auch eine Chance auf mehr offene meteorologische Daten. Als Sachverständige in der öffentlichen Anhörung des Verkehrsausschusses dazu werden wir am kommenden Mittwoch drei Themenbereiche ansprechen, die bisher im Entwurf noch nicht ausreichend geregelt sind: Die fehlende Verpflichtung zur vollständigen Veröffentlichung von Daten des Wetterdienstes, die ungenügende Form der Bereitstellung der Daten sowie freie Lizenzen der dazugehörigen Software. Unsere Stellungnahme zum Gesetzentwurf veröffentlichen wir hier vorab. Um das volle Potential der beim Wetterdienst vorliegenden Daten auszuschöpfen, muss der Wetterdienst seinen gesamten von Steuerzahlern bezahlten Bestand öffnen. Dieser kann dann nicht nur von Start-Ups in verschiedenen Wirtschaftssektoren wie Agritech oder autonomes Fahren genutzt werden, sondern auch von freien EntwicklerInnen und Ehrenamtlichen in anderen gesellschaftlichen Bereichen wie der Wissenschaft, im Journalismus oder der Seerettung. Ein Besuch der Anhörung am Mittwoch ist nach vorheriger Anmeldung beim Verkehrsausschuss möglich. Zudem wird es einen Livestream auf Bundestag.de geben.
    Die Stellungnahme im Volltext: Sehr geehrte Damen und Herren, die Open Knowledge Foundation Deutschland ist ein gemeinnütziger Verein, der sich für offenes Wissen, offene Daten, Transparenz und Beteiligung einsetzt. Die Community der Organisation besteht aus rund 1.200 EntwicklerInnen, DesignerInnen und AktivistInnen. Sie erstellen in ganz Deutschland unter anderem mit meteorologischen Daten etwa Anwendungen und Visualisierungen und nutzen diese sowohl ehrenamtlich fürs Gemeinwohl als auch vereinzelt in Start-Ups. Das Ziel der Bundesregierung, den Zugang zu und die Nutzung von meteorologischen Daten für Bürgerinnen, Verwaltung und Privatwirtschaft zu vereinfachen, begrüßen wir daher ausdrücklich. Eine Abschaffung von Gebühren ist alleine schon ökonomisch angeraten: Der Aufwand zur Erhebung von Gebühren stand bisher in keinem angemessenen Verhältnis zu den erzielten Erträgen. Die offene Bereitstellung der Daten und Leistungen fördert aber außerdem nicht nur den Wettbewerb und Innovation. Sie ist auch ein Mittel, um finanzschwächeren Akteuren wie WissenschaftlerInnen, Ehrenamtlichen, Start-Ups und kleinen Interessengruppen die Arbeit mit den Daten zu erleichtern. Das kann bei vielen der Daten sogar mittelbar Menschenleben retten, etwa im Zusammenhang mit Unwettern oder Seewetter. International haben viele Wetterdienste bereits begriffen, dass die Bereitstellung der Daten als Open Data der moderne Standard ist. Dies führt dazu, dass auch deutsche EntwicklerInnen in ihrer Arbeit oft nicht direkt auf Daten des DWD, sondern auf meteorologische Daten des Global Forecast Systems (GFS) in den USA zurückgreifen, da diese verhältnismäßig unkompliziert und vor allem gebührenfrei zu beziehen sind. Auch der norwegische Wetterdienst bietet online offene Daten nicht nur zu Norwegen, sondern Orten in der ganzen Welt an. Damit der Deutsche Wetterdienst die Ziele des vereinfachten Zugangs und der vereinfachten Nutzung tatsächlich erreicht, sind drei zentrale Verbesserungen im aktuellen Gesetzentwurf angeraten. Diese sind nötig, um das volle Potenzial der beim DWD vorliegenden Daten auszuschöpfen. Erstens sollte der Gesetzentwurf den DWD verpflichten, alle mit Steuergeld erhobenen meteorologischen Daten bereitzustellen. Eine Veröffentlichung der Daten ist nach dem vorliegenden Entwurf zwar möglich - es ist jedoch nicht sichergestellt, dass dies auch passiert. Bereits jetzt sind viele Daten des DWD öffentlich verfügbar, allerdings nicht vollständig oder aktuell. Beispielsweise listet der Globale Datensatz (GDS) nur 80 ausgewählte Wetterstationen. Es sollten aber alle 220 freigegeben werden. Außerdem sollten alle gemessenen Daten ohne räumliche oder zeitliche Begrenzung freigegeben werden. Dazu zählt etwa auch die Taupunkttemperatur, die Bodentemperatur, Strahlungsintensität und Schneehöhe, dazu noch spezielles Straßenwetter und Glättemeldeanlagen. Der DWD sollte künftig ICON-Modelldaten, Radardaten, Satellitendaten, Daten zur Bliotzortung und Sturmzellen sowie Daten aus Ballonaufstiegen und Luftfahrtmeldungen live veröffentlichen. Diese werden derzeit meist nicht in voller vorhandener Auflösung bereitgestellt (ICON, Radardaten) oder werden nicht aktuell gehalten. Diese Daten werden in der Gegenäußerung der Bundesregierung zur Stellungnahme des Bundesrats nicht erwähnt. Es bietet sich daher an, zumindest in der Gesetzesbegründung zu Nummer 2 (§ 4 Aufgaben) bzw. Nummer 3 (§ 6 Absatz 2a Vergütungen) explizit eine Verpflichtung zur Veröffentlichung vollständiger und aktueller Datenbestände aufzunehmen. Sie sollten „Open by default“ sein, wie dies etwa der Gesetzentwurf der Bundesregierung fürs Open-Data-Gesetz für vergleichbare Daten vorsieht (Drucksache 18/11614). Nur in begründeten Ausnahmen sollten Daten nicht veröffentlicht werden. Auch ist eine Positivliste der in jedem Fall zu bereitstellenden Daten hilfreich, damit EntwicklerInnen und Start-Ups mit höherer Sicherheit von der Öffnung des Wetterdienstes profitieren können. Nicht berücksichtigt sind im Gesetzentwurf die Archivdaten des DWD, die besonders in Hinblick auf die Erforschung des Klimawandels zentral sind. Diese sind bis auf einige Ausnahmen bisher nur intern im DWD nutzbar und zu einem großen Teil nicht digitalisiert. Es wäre allerdings äußerst wünschenswert, wenn auch dieser Datenschatz für die Öffentlichkeit geöffnet werden könnte. Zweitens sind die veröffentlichten Daten nur begrenzt nutzbar, wenn sie wie bisher bereitgestellt werden. Um viele Daten zu nutzen, ist bisher ein Extra-Login auf einen FTP-Server nötig, wo Daten meist nur als Zip-Datei und in teils speziellen meteorologischen Formaten zur Verfügung stehen. Dies ist nicht zeitgemäß, wie etwa der erfolgreiche Dienst „Open Weather Map“ zeigt. Um in die Anschlussbarkeit der Daten zu investieren, müssen regelmäßig Programmierschnittstellen (APIs) bereitstellen, auf die externe Apps direkt in Echtzeit zugreifen können. Auf diese Weise müssen Daten nicht erst umständlich heruntergeladen und verarbeitet werden, sondern stehen unmittelbar zur Weiterverwendung zur Verfügung. Erst dies ermöglicht die Nutzung der Daten für eine große Zahl von EntwicklerInnen. In der Regel sind derzeit auch intern noch keine APIs beim DWD vorhanden, sodass diese erst entwickelt werden müssen. Um dies sicherzustellen, sollte die Begründung zu Nummer 3 (§ 6 Absatz 2a Vergütungen) nach Satz 1 um einen Satz ergänzt werden: „Die Daten mit kurzen Änderungsintervallen werden der Öffentlichkeit zusätzlich mittels Programmierschnittstellen bereitgestellt.“ Drittens sollte der Gesetzentwurf festschreiben, dass die Produkte des DWD im Sinne der PSI-Richtlinie bzw. des Informationsweiterverwendungsgesetzes (IWG) frei lizenziert werden, damit sie nicht nur verwendet, sondern auch weiterverwendet werden können. Wenn Software und sonstige Erzeugnisse unter einer freien Nutzungslizenz stehen, stärkt dies Innovation und Wettbewerb, was letztlich auch dem DWD selbst hilft: Die mögliche Weiterentwicklung der DWD-Erzeugnisse kann etwa dazu führen, dass Erweiterungen der Apps gebaut werden, die vom DWD wiederum aufgenommen werden können oder bei Open-Source-Software Sicherheitslücken durch freie EntwicklerInnen geschlossen werden. Dies kann zu einer Erhöhung der Datenqualität führen und einen Feedback-Kanal mit AnwenderInnen, EntwicklerInnen und Unternehmen etablieren. Software wie KLAM21 und MUKLIMO könnten etwa unter der European Union Public License (EUPL) lizenziert werden, audiovisuelle Erzeugnisse wie die Unwetterclips auf YouTube mit einer freien Lizenz wie CC0. Auch die Daten, die im Rahmen der Forschung des DWD nach § 4 Absatz 2 erhoben werden, sollten nach Open-Access-Gesichtspunkten veröffentlicht werden. Es bietet sich hier eine Ergänzung von Nummer 3 (§ 6 Absatz 2a) Satz 1 an: „[…] sind folgende Dienstleistungen des Deutschen Wetterdienstes unter freien Lizenz entgeltfrei: […]“ Eine dementsprechende Verbesserung des Gesetzentwurfes würde dazu beitragen, das Potential der Daten im Sinne der Digitalen Agenda besser auszuschöpfen und in verschiedenen Wirtschaftssektoren wie Agritech, autonomes Fahren, aber auch anderen gesellschaftlichen Bereichen wie der Wissenschaft oder auch speziellen Bereichen wie der Seerettung erfolgreich auszuschöpfen. Zudem könnte das Gesetz dann im Rahmen des Nationalen Aktionsplans der Open Government Partnership genutzt werden.

    Making European Subsidy Data Open

    Michael Peters - April 24, 2017 in OK Germany, Open Government Data, Open Spending

    One month after releasing subsidystories.eu a joint project of Open Knowledge Germany and Open Knowledge International, we have some great news to share. Due to the extensive outreach of our platform and the data quality report we published, new datasets have been directly sent to us by several administrations. We have recently added new data for Austria, the Netherlands, France and the United Kingdom. Furthermore, first Romanian data recently arrived and should be available in the near future. Now that the platform is up and running, we want to explain how we actually worked on collecting and opening all the beneficiary data. Subsidystories.eu is a tool that enables the user to visualize, analyze and compare subsidy data across the European Union thereby enhancing transparency and accountability in Europe. To make this happen we first had to collect the datasets from each EU member state and scrape, clean, map and then upload the data. Collecting the data was an incredible frustrating process, since EU member states publish the beneficiary data in their own country (and regional) specific portals which had to be located and often translated. A scraper’s nightmare: different websites and formats for every country The variety in how data is published throughout the European Union is mind-boggling. Few countries publish information on all three concerned ESIF Funds (ERDF, ESF, CF) in one online portal, while most have separate websites distinguished by funds. Germany provides the most severe case of scatteredness, not only is the data published by its regions (Germany’s 16 federal states), but different websites for distinct funds exist (ERDF vs. ESF) leading to a total of 27 German websites. Arguably making the German data collection just as tedious as collecting all data for the entire rest of the EU. Once the distinct websites were located through online searches, they often needed to be translated to English to retrieve the data. As mentioned the data was rarely available in open formats (counting csv, json or xls(x) as open formats) and we had to deal with a large amount of PDFs (51) and webapps (15) out of a total of 122 files. The majority of PDF files was extracted using Tabula, which worked fine some times and required substantial work with OpenRefine – cleaning misaligned data – for other files. About a quarter of the PDFs could not be scraped using tools, but required hand tailored scripts by our developer. Data Formats
    However, PDFs were not our worst nightmare: that was reserved for webapps such as this French app illustrating their 2007-2013 ESIF projects. While the idea of depicting the beneficiary data on a map may seem smart, it often makes the data useless. These apps do not allow for any cross project analysis and make it very difficult to retrieve the underlying information. For this particular case, our developer had to decompile the flash to locate the multiple dataset and scrape the data. Open data: political reluctance or technical ignorance? These websites often made us wonder what the public servants that planned this were thinking? They already put in substantial effort (and money) when creating such maps, why didn’t they include a “download data” button? Was it an intentional decision to publish the data, but make difficult to access? Or is the difference between closed and open data formats simply not understood well enough by public servants? Similarly, PDFs always have to be created from an original file, while simply uploading that original CSV or XLSX file could save everyone time and money. In our data quality report we recognise that the EU has made progress on this behalf in their 2013 regulation mandating that beneficiary data be published in an open format. While publication in open data formats has increased henceforth, PDFs and webapps remain a tiring obstacle. The EU should assure the member states’ compliance, because open spending data and a thorough analysis thereof, can lead to substantial efficiency gains in distributing taxpayer money. This blog has been reposted from https://okfn.de/blog/2017/04/Making-EU-Data-Open/

    Making European Subsidy Data Open

    Michael Peters - April 23, 2017 in Uncategorized

    One month after releasing subsidystories.eu a joint project with Open Knowledge International, we have some great news to share. Due to the extensive outreach of our platform and the data quality report we published, new datasets have been directly sent to us by several administrations. We have recently added new data for Austria, the Netherlands, France and the United Kingdom. Furthermore, first Romanian data recently arrived and should be available in the near future. Now that the platform is up and running, we want to explain how we actually worked on collecting and opening all the beneficiary data. Subsidystories.eu is a tool that enables the user to visualize, analyze and compare subsidy data across the European Union thereby enhancing transparency and accountability in Europe. To make this happen we first had to collect the datasets from each EU member state and scrape, clean, map and then upload the data. Collecting the data was an incredible frustrating process, since EU member states publish the beneficiary data in their own country (and regional) specific portals which had to be located and often translated. A scraper’s nightmare: different websites and formats for every country The variety in how data is published throughout the European Union is mind-boggling. Few countries publish information on all three concerned ESIF Funds (ERDF, ESF, CF) in one online portal, while most have separate websites distinguished by funds. Germany provides the most severe case of scatteredness, not only is the data published by its regions (Germany’s 16 federal states), but different websites for distinct funds exist (ERDF vs. ESF) leading to a total of 27 German websites. Arguably making the German data collection just as tedious as collecting all data for the entire rest of the EU. Once the distinct websites were located through online searches, they often needed to be translated to English to retrieve the data. As mentioned the data was rarely available in open formats (counting csv, json or xls(x) as open formats) and we had to deal with a large amount of PDFs (51) and webapps (15) out of a total of 122 files. The majority of PDF files was extracted using Tabula, which worked fine some times and required substantial work with OpenRefine - cleaning misaligned data - for other files. About a quarter of the PDFs could not be scraped using tools, but required hand tailored scripts by our developer. Data Formats
    However, PDFs were not our worst nightmare, that was reserved for webapps such as this french app illustrating their 2007-2013 ESIF projects. While the idea of depicting the beneficiary data on a map may seem smart, it often makes the data useless. These apps do not allow for any cross project analysis and make it very difficult to retrieve the underlying information. For this particular case, our developer had to decompile the flash to locate the multiple dataset and scrape the data. Open data: political reluctance or technical ignorance? These websites often made us wonder, what the public servants that planned this were thinking? They already put in substantial effort (and money) when creating such maps, why didn’t they include a “download data” button. Was it an intentional decision to publish the data, but make difficult to access? Or is the difference between closed and open data formats simply not understood well enough by public servants? Similarly, PDFs always have to be created from an original file, while simply uploading that original CSV or XLSX file could save everyone time and money. In our data quality report we recognise that the EU has made progress on this behalf in their 2013 regulation mandating that beneficiary data be published in an open format. While publication in open data formats has increased henceforth, PDFs and webapps remain a tiring obstacle. The EU should assure the member states’ compliance, because open spending data and a thorough analysis thereof, can lead to substantial efficiency gains in distributing taxpayer money.