You are browsing the archive for OKI Projects.

FutureTDM: The Future of Text and Data Mining

- March 3, 2017 in FutureTDM, OKI Projects, text and data mining

Blog written by Freyja van den Boom (FutureTDM researcher) and Lieke Ploeger. Since September 2015 Open Knowledge International has been working on finding new ways to improve the uptake of text and data mining in the EU, as part of the FutureTDM project. Text and data mining (TDM) is the process of extracting relevant information from large amounts of machine-readable data (such as scientific papers) and recombining this to unlock new knowledge and power innovation (see ‘Techniques, Tools & Technologies for TDM in Europe’). Project partners include libraries, publishers and universities, but also the non-profit organisation ContentMine that advocates for the right to mine content. Open Knowledge International leads the work on communication, mobilisation and networking and undertakes the research into best practices and methodologies. A practical example explaining the use of TDM

Because the use of TDM is significantly lower in Europe than in some countries in the Americas and Asia, FutureTDM actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help pinpoint why uptake is lower, raise awareness of TDM and develop solutions. This is especially important at this current time, because an exception for TDM under copyright law is discussed on a European level. Such an exception will make copyright law less restrictive for TDM carried out under certain circumstances.

Throughout 2016 we organised Knowledge Cafe’s across Europe as an informal opportunity to gather feedback on text and data mining from researchers, developers, publishers, SMEs and any other stakeholder groups working in the field and held stakeholder consultations with the various communities.  In September 2016 we held the first of two workshops to discuss the project’s findings in Brussels where many MEPs and policymakers were present. In early 2017 a roundtable was organised at the Computer Privacy and Data Protection (CPDP) conference in Brussels, where the impact of data protection regulations for the uptake of advanced data analysis technologies like TDM was discussed.

MEP Julia Reda discussing the upcoming copyright reform at the FutureTDM workshop

Below are some of the insights we have gained through are research so far, which include the main barriers for different TDM stakeholder communities. In the upcoming months we will be publishing more of the results and proposed solutions on how to overcome them.

Education and skill
There is a need for more education on the benefits and practical use of TDM for researchers: working together with industry, publishing community and academia to develop effective courses aimed at different levels depending on the discipline and type of research that is likely to use TDM. We are currently working on TDM education and looking to get feedback on what the learning outcomes should be. If you are interested to get involved contact us !

Legal and policy
There is no legal clarity about the legal status of TDM practices and use of results that are gained through using TDM. Barriers include the uncertainty about the scope of copyright, database protection and privacy and data protection regulations. See for example our guest blog here.
The current copyright reform discussions focuses partly on a TDM exception which could help provide more clarity. Under discussion is for example what data and what usefalls under copyright, for example whether there should be a distinction between commercial and non-commercial use. FutureTDM partners are monitoring these developments.

We have recently published the FutureTDM policy framework introducing high level principles that should be the foundation of every stakeholder action that aims to promote TDM. These high level principles are:
  • Awareness and Clarity: actions should improve certainty on the use of TDM technologies. Information and clear actions are crucial for a flourishing TDM environment in Europe.
  • TDM without Boundaries: insofar as appropriate, boundaries should be cleared to prevent and take away fragmentation in the TDM landscape.
  • Equitable Access: access to TDM tools and technologies, as well as sources (such as datasets), are indispensable for a successful uptake of TDM, but usually comes at a price. While a broadest possible access to tools and data should be the aspiration, providers of these also have a legitimate interest in restricting access, for example for the protection of their investments or any privacy related interest.
Technical and infrastructure
The main concern is access and quality of available data. There is a confidence in the technological developments of more reliable and easy to use tools and services, although the documentation and findability of relevant tools and services is reported as a barrier at the moment. Developing standards for data quality is seen as a useful but most likely impossible solution given the diversity in projects and requirements, which would make standards too complex for compliance. Economy and Incentives
Barriers that are mentioned are the lack of a single European market, the problems of having multiple languages and a lack of enforcement for US companies. Further research
The interviews and the case studies have provided evidence of and insight into the barriers that exist in Europe. To what extent these barriers can be solved given the different interests of the stakeholders involved remains a topic for further research within the FutureTDM project. We will continue to work on recommendations, guidelines and best practices for improving the uptake of TDM in Europe, focused on addressing the barriers presented by the main stakeholders. All findings, which include policy recommendations, guidelines, case studies, best practices, practical tutorials and help and how to guides to increase TDM uptake are shared through the platform at www.futuretdm.eu. The FutureTDM awareness sheets for example cover a range of factors that have an impact on TDM uptake and were created from our expert reports, expert interviews and discussions through our Knowledge Café events. The reports that have been completed so far are available from the Knowledge Library. In the final six months of the FutureTDM project, there are many opportunities to find out more about the results and give your feedback on the situation around TDM in Europe. On 29 March, the second FutureTDM workshop at the European Parliament in Brussels  will take place, where your input on TDM experiences on the ground is very welcome. With EU copyright reform now in progress, we bring together policy makers and stakeholder groups so that we can share FutureTDM’s  findings and our first expert driven policy recommendations that can help increase EU TDM. To find out more and sign up, please check the event page. We will showcase the final project results during the final FutureTDM symposium, organised in conjunction with the International Data Science Conference (12-13 June 2017, Salzburg, Austria. Our animation explaining TDM and the importance of stakeholder engagement < p style="text-align: left;">

Help measure your government’s openness: The Global Open Data Index 2016 is here!

- November 10, 2016 in Global Open Data Index, OKI Projects, Open Data, Open Government Data

godi We are happy to announce that The Global Open Data Index (GODI) 2016 is officially live! After months of hard work, taking the community feedback and building it into the new methodology, and redesigning the whole survey from the questions to interface and how you interact with it. This is the fourth year that we will evaluate the quality and availability national governments open data. We expect this edition of GODI to be the most comprehensive edition and with the most submitters ever. To do this, we need your participation! We will explain a couple details so you can start submitting as soon as possible.

Timeline

This year we’ll accept submissions from today until the 15th of December.  After this process, the data will have a reviewing period that will start in January. After the review stage, we will send the Index to governments for commenting. This is done to check for errors that might have come up. The final decision regarding a dataset will be in the reviewer’s hands. After all of the reviews are completed and the datasets are evaluated by their respective governments, a country’s final score will be shown publicly on the Index website.

The Survey

The submission process is quite easy, just go to global.survey.okfn.org, find your country and click “add” on the dataset you want to evaluate. For this year we will ask people to log in with facebook or google. This is only for authentication reasons, we won’t store any of this information in the system and if you want, you can choose to submit anonymously. The index is built in two units — places, which represent national governments or other official jurisdictions, and datasets. In this year’s Index, we have 15 datasets representing different governmental themes.  Each dataset has a description followed by a list of characteristics. These characteristics describe the dataset and seek to assist you in finding the right dataset to evaluate. Each of these datasets is first evaluated by a contributor that responds to the questions in the survey. If you have submitted previously to the Index, one of the first changes you’ll notice is the survey has been completely redesigned by our great Sam Smith. Second, you’ll find a couple of questions where we ask how much you know about the dataset being evaluated and about open data. This information will help us get a clearer notion of who is submitting and how easy or difficult it is for people to find the data. As we mentioned in previous blog posts, we changed the way we frame the questions. We tried to make them clearer and more straightforward. If you need to add any information to the question, there is a comment section for each question, where you can tell us more about where you find the information or why you might not have found it.  We encourage you to comment as much as you can, this extra info will be really valuable.

If you get lost

This is a collective effort. If you have a question about the Index, you can look at the FAQ, maybe you’ll find some answers. If you are trying to submit and something isn’t clear, you can always ask in our forum. If you want very specific info or need a quicker answer, you can always write to index@okfn.org and our team will get back to you with as many answers as we can. Thank you for submitting!

What is the Open Fiscal Data Package?

- October 20, 2016 in OKI Projects, Open Data, Open Government Data, Open Spending

pablo-fiscal-data This post looks at the Open Fiscal Data Package – an open standard for publishing fiscal data developed by Open Knowledge International, GIFT and the World Bank. In September of 2016, Mexico became the first country to officially endorse the OFDP, by publishing Federal Budget data  in open formats using OpenSpending tools. OpenSpending is one of Open Knowledge International’s core projects. It is a free and open platform for accessing information on government spending. OpenSpending supports civil society organisations by creating tools and standards so citizens can easily track and analyse public fiscal information globally.  The Open Fiscal Data Package (formerly Budget Data Package) is a simple specification for publishing fiscal data.  The first iteration was developed between 2013 and 2014 in collaboration with multiple partners including the International Budget Partnership (IBP), Omidyar Network, Google.org, the Global Initiative for Fiscal Transparency (GIFT), the World Bank and others. The 0.3 version of the OFDP was released at the beginning of 2016, featuring a major revision in the structure and approach, establishing the foundation for all future work leading up to a future v1 release of the specification. The OFDP is part of our work towards “Frictionless Fiscal Data” where users of fiscal information –  from journalists to researchers to policy makers themselves – will be able to access and analyze government data on budgets and expenditures, reducing the time it takes to gather insights and drive positive social change. The Open Fiscal Data Package enables users to generate useful visualizations like the following one in only a few clicks:

picture1Explore the visualization here.

Having a standard specification for fiscal data is essential to being able to scale this work, allowing tool-makers to automate:
  • Aggregations (e.g. how much did we spend on defence in 2014?)
  • Search (e.g. how much money did we give to IBM?)
  • Comparison (e.g. are we spending more or less than the country next door?)
We have drawn on excellent related work from similar initiatives like the International Aid Transparency Initiative (IATI), the Open Contracting Partnership, and others while aiming to keep the specification driven by new and existing tooling as much as possible.  The specification took into account existing tools and platforms, in order to ensure that adaptations are simpler and with less friction. The Open Fiscal Data Package and its associated tooling was built to reduce the “friction” in accessing and using public fiscal information, making it much easier for governments to publish data and for users of the data, such as journalists, researchers and policy-makers, to access and analyse the information quickly and reliably.

What’s New in version 0.3?

The Open Fiscal Data Package is an extension of Tabular Data Package which itself is an extension of Data Package, an emerging standard for packaging any type of data.  Data Packages are formats for any kind of data based on existing practices for publishing open-source software. We extend this standard for fiscal data through mapping values from transaction line items in the on-disk dataset (in this case, a CSV file) to a conceptual representation of financial amounts, entities (e.g. payee/payor), classifications (e.g. COFOG), or government projects. Our approach to describing the logical model is based heavily on the terminology and approach of OLAP (Online Analytical Processing). This approach allows answering multi-dimensional analytical queries swiftly. Through a system of community feedback via GitHub issues, we have defined methods of modeling hierarchical budgets, the “direction” of a given transaction, as well as the fiscal periods for specific spending.  In addition, we support both aggregated and transactional datasets, as well as budgets containing  “status” information (e.g. “proposed”, “approved”, “adjusted”, and “executed”).

Tools

We are committed to developing this standard in concert with developing the tooling to support it. OpenSpending Next, the next version of OpenSpending is currently working natively with the Open Fiscal Data Package.

The Future

Fiscal data comes in many forms, and we have sought to model a large variety of datasets in the simplest terms possible.  In the future, we are looking to support a wider variety of data. In the next few months, the OpenSpending team will pilot the OFDP specification in a number of countries. The specification and the OpenSpending tools are free and available to use to any interested stakeholder. To find out more, get in touch with us on the discussion forum. To hear more about the Open Fiscal Data Package and OpenSpending tools, join us for the Google Hangout on October 25th, 4 pm Berlin time. More details can be found here.

What is the Open Fiscal Data Package?

- October 20, 2016 in OKI Projects, Open Data, Open Government Data, Open Spending

pablo-fiscal-data This post looks at the Open Fiscal Data Package – an open standard for publishing fiscal data developed by Open Knowledge International, GIFT and the World Bank. In September of 2016, Mexico became the first country to officially endorse the OFDP, by publishing Federal Budget data  in open formats using OpenSpending tools. OpenSpending is one of Open Knowledge International’s core projects. It is a free and open platform for accessing information on government spending. OpenSpending supports civil society organisations by creating tools and standards so citizens can easily track and analyse public fiscal information globally.  The Open Fiscal Data Package (formerly Budget Data Package) is a simple specification for publishing fiscal data.  The first iteration was developed between 2013 and 2014 in collaboration with multiple partners including the International Budget Partnership (IBP), Omidyar Network, Google.org, the Global Initiative for Fiscal Transparency (GIFT), the World Bank and others. The 0.3 version of the OFDP was released at the beginning of 2016, featuring a major revision in the structure and approach, establishing the foundation for all future work leading up to a future v1 release of the specification. The OFDP is part of our work towards “Frictionless Fiscal Data” where users of fiscal information –  from journalists to researchers to policy makers themselves – will be able to access and analyze government data on budgets and expenditures, reducing the time it takes to gather insights and drive positive social change. The Open Fiscal Data Package enables users to generate useful visualizations like the following one in only a few clicks:

picture1Explore the visualization here.

Having a standard specification for fiscal data is essential to being able to scale this work, allowing tool-makers to automate:
  • Aggregations (e.g. how much did we spend on defence in 2014?)
  • Search (e.g. how much money did we give to IBM?)
  • Comparison (e.g. are we spending more or less than the country next door?)
We have drawn on excellent related work from similar initiatives like the International Aid Transparency Initiative (IATI), the Open Contracting Partnership, and others while aiming to keep the specification driven by new and existing tooling as much as possible.  The specification took into account existing tools and platforms, in order to ensure that adaptations are simpler and with less friction. The Open Fiscal Data Package and its associated tooling was built to reduce the “friction” in accessing and using public fiscal information, making it much easier for governments to publish data and for users of the data, such as journalists, researchers and policy-makers, to access and analyse the information quickly and reliably.

What’s New in version 0.3?

The Open Fiscal Data Package is an extension of Tabular Data Package which itself is an extension of Data Package, an emerging standard for packaging any type of data.  Data Packages are formats for any kind of data based on existing practices for publishing open-source software. We extend this standard for fiscal data through mapping values from transaction line items in the on-disk dataset (in this case, a CSV file) to a conceptual representation of financial amounts, entities (e.g. payee/payor), classifications (e.g. COFOG), or government projects. Our approach to describing the logical model is based heavily on the terminology and approach of OLAP (Online Analytical Processing). This approach allows answering multi-dimensional analytical queries swiftly. Through a system of community feedback via GitHub issues, we have defined methods of modeling hierarchical budgets, the “direction” of a given transaction, as well as the fiscal periods for specific spending.  In addition, we support both aggregated and transactional datasets, as well as budgets containing  “status” information (e.g. “proposed”, “approved”, “adjusted”, and “executed”).

Tools

We are committed to developing this standard in concert with developing the tooling to support it. OpenSpending Next, the next version of OpenSpending is currently working natively with the Open Fiscal Data Package.

The Future

Fiscal data comes in many forms, and we have sought to model a large variety of datasets in the simplest terms possible.  In the future, we are looking to support a wider variety of data. In the next few months, the OpenSpending team will pilot the OFDP specification in a number of countries. The specification and the OpenSpending tools are free and available to use to any interested stakeholder. To find out more, get in touch with us on the discussion forum. To hear more about the Open Fiscal Data Package and OpenSpending tools, join us for the Google Hangout on October 25th, 4 pm Berlin time. More details can be found here.

Why civil society organisations are using OpenSpending to share fiscal data with the public

- September 15, 2016 in OKI Projects, Open Spending

OpenSpending is one of Open Knowledge International’s current projects. It is a free and open platform for citizens looking to track and analyse public fiscal information globally. While the OpenSpending team was busy revamping the platform over the last year we have been fortunate to have a community of users actively involved in testing the new tools. Here we  highlight the experiences of three partner civil society organisations collecting and structuring budget and spending data and using OpenSpending tools to present this data to the public. It also gives an insight into the challenges these organisations faced in data collection and solutions they employed to reduce data barriers.
openspending-collagePublic Domain icons by David Merfield

Sinar Project in Malaysia: Open Spending Data in Constrained Environments

Sinar Project is an initiative that uses open source technology and applications to make important information accessible to the Malaysian people. Sinar has been working to engage disenfranchised communities in the budget process, in order to hold the government accountable for budgets that respond to the needs of citizens. Over the course of  2016, the team at Sinar has been working to obtain and to prepare over 100 datasets for upload on OpenSpending. So far, they uploaded over 40 datasets on the platform. Amongst others, the team published the 2014 allocated budgets for public housing maintenance in Kota Damansara township. Data uploaded and visualized on OpenSpending was shared with the community’s leaders for review. This gave the community the opportunity to compare and contrast how planned budget allocation matched up with how funds were actually spent. The community leaders identified potential misuse of funds in some budgets lines and are continuing to conduct investigations and collect evidence to expose poor management of public finances in Kota Damansara. Data and visualizations are available on OpenSpending Viewer. screen-shot-2016-09-14-at-11-32-56 It wasn’t easy for the team to obtain such data. First, they had to file a Freedom of Information (FOI) request to the state owned Selangor Housing and Property Agency. They also went into meetings with the authorities to get an indepth understanding of the data. Sinar continuously faces challenges in data collection of budgets at all levels of government. For example, for previous years, budgets for the federal government are not publicly available and there is no FOI law applicable to the federal government. There are roadblocks in data collection for state governments and for city councils as well.

“…to engage disenfranchised communities in the budget process…”

In spite of the roadblocks and reluctance of authorities to collaborate, the team at Sinar have filed FOI requests to the Selangor state government and Petaling Jaya city council to get access to fiscal budgets. They have also filed FOI requests to the management company responsible for the Kota Damansara public housing, obtaining access to data on how MYR 5 million (USD 1.2 million) were allocated to repair railings for all housing blocks and data on allocated budgets for public housing maintenance in 2014 and 2015. Moving forward, Sinar Project is planning to continue using OpenSpending to:
  1. Address budget priorities at all levels of government
  2. Visualize allocated budgets and compare to official government policies and implementation of government programmes
  3. Make use of evidence based budget data and various survey results to hold the decision makers at all levels accountable
  4. Advocate for transparency in open data, promote better access to government budgets data, and push for better open data policies.

Metamorphosis Project in Macedonia: Revamp the current Follow the Money website

Metamorphosis Foundation is a civil society organization from Macedonia, having been active for more than 15 years. Several years ago they started collaborating with Open Knowledge International to implement the “Open Data Civil Society network” project, with the aim of improving the capacity of civil society organizations in the country. Moreover, they established School of Data Macedonia in order to promote an open agenda. In 2012, Metamorphosis Project in Macedonia developed their Follow the Money website to familiarise citizens with the fiscal policies of local authorities. However, while budget information was presented on the site, over time it has lost its popularity.  In 2015, the School of Data fellow conducted in-depth user research to better understand why the site wasn’t being used and how it could be improved to better serve its potential user communities. Ultimately, the team at Metamorphosis decided to revamp the website.

“…improving the capacity of civil society organizations in the country.”

They focused on collecting, cleaning and preparing budget data from all 80 municipalities as well as the country’s central budget. Take a look at the planned Central Budget for 2016 made available on OpenSpending:
screen-shot-2016-09-14-at-11-31-51For the above visualization, click this link. Explore years 2010 to 2016 at this link.
Like with the Sinar project, data collection was incredibly challenging. Budget data for most municipalities was  “locked” in PDFs or not published at all. Instead of trying to get the data from the source, Metamorphosis partnered with other CSOs in the country that work closely with the municipalities who were willing to share the data that they had already collected. Another issue they are facing is the lack of granularity of the published data and official institutions unwilling to provide more detailed data. Finally, while the central government budget was made available in machine readable format, it only included the economic budget classification, which identifies the type of budget and expenditure incurred, for example, salaries, goods and services, transfers and interest payments, or capital spending. Since the team needed the functional classification (expenditure according to the purposes and objectives for which they are intended) for the website, they had to scrape it from the website of the Ministry of Finance. The website includes functional classification data, since this is how the team found most useful to display data to users. In the next few months, the team is working to identify funds to launch the revamped version of Macedonian Follow the Money website with embedded visualizations created on OpenSpending, and continue updating their data on the platform.

AfroLeadership in Cameroon: Open Local Budgets

AfroLeadership is a civil society organization in Cameroon, founded in 2007 and committed to the promotion of open data and civic technologies for governance, transparency and citizen participation. For several years, AfroLeadership has been promoting the use of a financial management information system in local governments, in order to improve budget transparency, accountability and public participation to budgeting. The adoption of the Financial Management Information System by several councils aims at improving budget reliability, budget execution and the ratio of budget reports to supreme audit institutions (SAI).

“…to bring budget information to citizens and CSOs in an accessible and open way…”

The Cameroon Open Local Budgets (COLB) project, launched in 2016, seeks to fight corruption, improve local accountability and ensure effective service delivery by collecting and publishing all 374 (the number of councils in Cameroon) approved budgets and accounts for all local authorities in Cameroon on OpenSpending. This project is a continuation of the organisation’s effort to bring budget information to citizens and CSOs in an accessible and open way, and engage them in public and local affairs. The goal of the current OpenSpending Cameroon pilot phase is to upload 50 data sets for 2015 budget reports. For example, uploaded data on Cameroon’s Dschang council looks at functional expenses versus investment expenses, while a drill down into these categories lets users explore the expenses for each budget category. screen-shot-2016-09-14-at-11-29-40 The AfroLeadership team also faces challenges in data collection. Even if the deadline for 2015 budget reports and account production was at the end of May of this year, collection of these accounts has been more difficult than expected. Audit Bench of the Supreme Court of Cameroon has stressed the fact that less than 10% of budgets reports are received at their desk each year. To address data collection challenges, AfroLeadership has organized information workshops to present to diverse stakeholders (Mayors, Supreme Audit Institutions, Civil Society Organisations, Journalists, etc.) the necessity of involving citizens in the budget cycle. Also, AfroLeadership has invited its institutional partner on this project, the national community driven development program (PNDP), to help collect approved 2015 budgets reports and accounts. AfroLeadership is currently in touch with the Ministry of Finance to explore opportunities in improving budget report collection in local governments. All these organizations have been involved in upload training sessions on OpenSpending and now that the platform is available in Alpha, they are working to publish the data to the larger public through OpenSpending. To browse existing datasets and to upload your data, visit OpenSpending. For questions, OpenSpending team is available via OpenSpending discussion forum, on Gitter.im in the OpenSpending chat room, or on the OpenSpending issue tracker.

Progress report: OpenTrials – linking clinical trial data

- July 15, 2016 in Featured Project, OKI Projects, Open Trials, opentrials

Since last year Open Knowledge has been developing OpenTrials, an open, online database linking the publicly available data and documents on all clinical trials conducted – something that has been talked about for many years but never created. The project is funded by The Laura and John Arnold Foundation and directed by Dr. Ben Goldacre, an internationally known leader on clinical trial transparency. Having an open and freely re-usable database of the world’s clinical trial data will increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data, and drive standards around open data in evidence-based medicine. The project is currently in its first phase (which runs until March 2017), where the focus is on building and populating the first prototype of the OpenTrials database, as well as raising awareness of the project in the community and getting user involvement and feedback. The progress that has been made so far was presented last month at the Evidence Live conference in Oxford, which brought together leaders across the world of Evidence Based Medicine, including researchers, doctors, and the pharmaceutical industry. This was an excellent opportunity to demonstrate the project and speak to both researchers who want to use the platform as well as people with a general enthusiasm for its impact on medicine. Around 40 people attended our talk which explained why OpenTrials is an important infrastructure project for medicine, covered some of the technical aspects of the platform, details of what data we’ve imported so far, and lastly a quick demo. If you’re feeling impatient, here are the slides from the talk, or scroll down for a summary. OpenTrials at Evidence Live

Ben Goldacre and Vitor Baptista present OpenTrials at Evidence Live 2016 (photo by benmeg / CC BY)

What we’ve imported into the OpenTrials database so far

  • 331,999 deduplicated trials, collected from nine clinical trial registries:
    • ANZCTR 11,645
    • ClinicalTrials.gov 205,422
    • EU CTR 35,159
    • GSK 4,131
    • ISRCTN 14,256
    • Pfizer 1,567
    • Takeda 1,142
    • UMIN 20,557
    • WHO ICTRP 298,688
Imported trials

Current functionality

  • Basic search (by keyword)
  • Searching for trials with publications
  • Uploading missing data/documents for a particular trial
  • Showing trials with discrepancies (e.g. target sample size)

What we’re importing next

Feedback and get involved

If you attended the talk and have any questions or feedback, please email us. And generally if you’re interested in contributing to OpenTrials, get in touch. Want to get early access to the data and be a user tester? Sign up and we’ll be in touch soon.