You are browsing the archive for data infrastructures.

Sloan Foundation Funds Frictionless Data for Reproducible Research

- July 12, 2018 in data infrastructures, Featured, Frictionless Data

We are excited to announce that Open Knowledge International has received a grant of $750,000 from The Alfred P. Sloan Foundation for our project “Frictionless Data for Reproducible Research”. The new funding from Sloan enables us to continue work over the next 3 years via enhanced dissemination and training activities, as well as further iteration on the software and specifications via a range of deep pilot projects with research partners.  
With
Frictionless Data, we focus specifically on reducing friction around discoverability, structure, standardization and tooling. More generally, the technicalities around the preparation, validation and sharing of data, in ways that both enhance existing workflows and enable new ones, towards the express goal of minimizing the gap between data and insight. We do this by creating specifications and software that are primarily informed by reuse (of existing formats and standards), conceptual minimalism, and platform-agnostic interoperability. Over the last two years, with support from Sloan and others, we have validated the utility and usefulness of the Frictionless Data approach for the research community and found strong commonalities between our experiences of data work in the civic tech arena, and the friction encountered in data-driven research. The pilots and case studies we conducted over this period have enabled us to improve our specifications and software, and to engage with a wider network of actors interested in data-driven research from fields as diverse as earth science, computational biology, archeology, and the digital humanities. Building on work going on for nearly a decade, last September we launched v1 of the Frictionless Data specifications, and we have produced core software that implements those specifications across 7 programming languages. With the new grant we will iterate on this work, as well as run additional Tool Fund activities to facilitate deeper integration of the Frictionless Data approach in a range of tools and workflows that enable in reproducible research. A core point of friction in working with data is the discoverability of data. Having a curated collection of well-maintained datasets that are of high value to a given domain of inquiry is an important move towards increasing quality of data-driven research. With this in mind, we will also be organising efforts to curate datasets that are of high-value in the domains we work. This high-value data will serve as a reference for how to package data with Frictionless Data specifications, and provide suitable material for producing domain-specific training materials and guides. Finally, we will be focussing on researchers themselves and are planning a programme to recruit and train early career researchers to become trainers and evangelists of the tools in their field(s). This programme will draw lessons from years of experience running data literacy fellowships with School of Data and Panton Fellowships for OpenScience. We hope to meet researchers where they are and work with them to demonstrate the effectiveness of our approach and how our tools and bring real value to your work. Are you a researcher looking for better tooling to manage your data? Do you work at or represent an organization working on issues related to research and would like to work with us on complementary issues for which data packages are suited? Are you a developer and have an idea for something we can build together? Are you a student looking to learn more about data wrangling, managing research data, or open data in general? We’d love to hear from you.  If you have any other questions or comments about this initiative, please visit this topic in our forum,  hashtag #frictionlessdata or speak to the project team on the public gitter channel.   The Alfred P. Sloan Foundation is a philanthropic, not-for-profit grant-making institution based in New York City. Established in 1934 by Alfred Pritchard Sloan Jr., then-President and Chief Executive Officer of the General Motors Corporation, the Foundation makes grants in support of original research and education in science, technology, engineering, mathematics and economic performance.  

Improving your data publishing workflow with the Frictionless Data Field Guide

- March 27, 2018 in data infrastructures, Data Quality, Frictionless Data

The Frictionless Data Field Guide provides step-by-step instructions for improving data publishing workflows. The field guide introduces new ways of working informed by the Frictionless Data suite of software that data publishers can use independently, or adapt into existing personal and organisational workflows. Data quality and automation of data processing are essential in creating useful and effective data publication workflows. Speed of publication, and lowering costs of publication, are two areas that are directly enhanced by having better tooling and workflows to address quality and automation. At Open Knowledge International, we think that it is important for everybody involved in the publication of data to have access to tools that help automate and improve the quality of data, so this field guide details open data publication approaches with a focus on user-facing tools for anyone interested in publishing data. All of the Frictionless Data tools that are included in this field guide are built with open data publication workflows in mind, with a focus on tabular data, and there is a high degree of flexibility for extended use cases, handling different types of open data. The software featured in this field guide are all open source, maintained by Open Knowledge International under the Frictionless Data umbrella and designed to be modular. The preparation and delivery of the Frictionless Data Field Guide  has been made possible by the Open Data Institute, who received funding from Innovate UK to build “data infrastructure, improve data literacy, stimulate data innovation and build trust in the use of data” under the pubtools programme. Feel free to engage the Frictionless Data team and community on Gitter. The Frictionless Data project is a set of simple specifications to address common data description and data transport issues. The overall aim is to reduce friction in working with data and to do this by making it as easy as possible to transport data between different tools and platforms for further analysis. At the heart of Frictionless Data is the Data Package, which is a simple format for packaging data collections together with a schema and descriptive metadata. For over ten years, the Frictionless Data community has iterated extensively on tools and libraries that address various causes of friction in working with data, and this work culminated in the release of v1 specifications in September 2017.  

Who Will Shape the Future of the Data Society?

- October 5, 2016 in data infrastructures, Events, Featured, Featured Project, iodc16, Open Data, Open Government Data, Policy, research

This piece was originally posted on the blog of the International Open Data Conference 2016, which takes place in Madrid, 6-7th October 2016. The contemporary world is held together by a vast and overlapping fabric of information systems. These information systems do not only tell us things about the world around us. They also play a central role in organising many different aspects of our lives. They are not only instruments of knowledge, but also engines of change. But what kind of change will they bring? Contemporary data infrastructures are the result of hundreds of years of work and thought. In charting the development of these infrastructures we can learn about the rise and fall not only of the different methods, technologies and standards implicated in the making of data, but also about the articulation of different kinds of social, political, economic and cultural worlds: different kinds of “data worlds”. future-data-pablo Beyond the rows and columns of data tables, the development of data infrastructures tell tales of the emergence of the world economy and global institutions; different ways of classifying populations; different ways of managing finances and evaluating performance; different programmes to reform and restructure public institutions; and how all kinds of issues and concerns are rendered into quantitative portraits in relation to which progress can be charted – from gender equality to child mortality, biodiversity to broadband access, unemployment to urban ecology. The transnational network assembled in Madrid for the International Open Data Conference has the opportunity to play a significant role in shaping the future of these data worlds. Many of those present have made huge contributions towards an agenda of opening up datasets and developing capacities to use them. Thanks to these efforts there is now global momentum around open data amongst international organisations, national governments, local administrations and civil society groups – which will have an enduring impact on how data is made public. Perhaps, around a decade after the first stirrings of interest in what we know know as “open data”, it is time to have a broader conversation around not only the opening up and use of datasets, but also the making of data infrastructures: of what issues are rendered into data and how, and the kinds of dynamics of collective life that these infrastructures give rise to. How might we increase public deliberation around the calibration and direction of these engines of change? Anyone involved with the creation of official data will be well aware that this is not a trivial proposition. Not least because of the huge amount of effort and expense that can be incurred in everything from developing standards, commissioning IT systems, organising consultation processes and running the social, technical and administrative systems which can be required to create and maintain even the smallest and simplest of datasets. Reshaping data worlds can be slow and painstaking work. But unless we instate processes to ensure alignment between data infrastructures and the concerns of their various publics, we risk sustaining systems which are at best disconnected from and at worst damaging towards those whom they are intended to benefit. What might such social shaping of data infrastructures look like? Luckily there is no shortage of recent examples – from civil society groups campaigning for changes in existing information systems (such as advocacy around the UK’s company register), to cases of citizen and civil society data leading to changes in official data collection practices, to the emergence of new tools and methods to work with, challenge and articulate alternatives to official data. Official data can also be augmented by “born digital” data derived from a variety of different platforms, sources and devices which can be creatively repurposed in the service of studying and securing progress around different issues. While there is a great deal of experimentation with data infrastructures “in the wild”, how might institutions learn from these initiatives in order to make public data infrastructures more responsive to their publics? How can we open up new spaces for participation and deliberation around official information systems at the same time as building on the processes and standards which have developed over decades to ensure the quality, integrity and comparability of official data? How might participatory design methods be applied to involve different publics in the making of public data? How might official data be layered with other “born digital” data sources to develop a richer picture around issues that matter? How do we develop the social, technical and methodological capacities required to enable more people to take part not just in using datasets, but also reshaping data worlds? Addressing these questions will be crucial to the development of a new phase of the open data movement – from the opening up of datasets to the opening up of data infrastructures. Public institutions may find they have not only new users, but new potential contributors and collaborators as the sites where public data is made begin to multiply and extend outside of the public sector – raising new issues and challenges related to the design, governance and political economics of public information systems. The development of new institutional processes, policies and practices to increase democratic engagement around data infrastructures may be more time consuming than some of the comparatively simpler steps that institutions can take to open up their datasets. But further work in this area is vital to secure progress on a wide range of issues – from tackling tax base erosion to tracking progress towards commitments made at the recent Paris climate negotiations. As a modest contribution to advancing research and practice around these issues, a new initiative called the Public Data Lab is forming to convene researchers, institutions and civil society groups with an interest in the making of data infrastructures, as well as the development of capacities that are required for more people to not only take part in the data society, but also to more meaningfully participate in shaping its future.

New Discussion Paper: “Democratising the Data Revolution”

- July 9, 2015 in Campaigning, civil society, data infrastructures, Data Journalism, Data Revolution, Featured, Open Data, Open Government Data, Open Knowledge, Policy, research

Democratising the Data Revolution
“New technologies are leading to an exponential increase in the volume and types of data available, creating unprecedented possibilities for informing and transforming society and protecting the environment. Governments, companies, researchers and citizen groups are in a ferment of experimentation, innovation and adaptation to the new world of data, a world in which data are bigger, faster and more detailed than ever before. This is the data revolution.” – UN Data Revolution Group, 2014
What will the “data revolution” do? What will it be about? What will it count? What kinds of risks and harms might it bring? Whom and what will it serve? And who will get to decide? Today we are launching a new discussion paper on “Democratising the Data Revolution”, which is intended to advance thinking and action around civil society engagement with the data revolution. It looks beyond the disclosure of existing information, towards more ambitious and substantive forms of democratic engagement with data infrastructures.1 It concludes with a series of questions about what practical steps institutions and civil society organisations might take to change what is measured and how, and how these measurements are put to work. You can download the full PDF report here, or continue to read on in this blog post.

What Counts?

How might civil society actors shape the data revolution? In particular, how might they go beyond the question of what data is disclosed towards looking at what is measured in the first place? To kickstart discussion around this topic, we will look at three kinds of intervention: changing existing forms of measurement, advocating new forms of measurement and undertaking new forms of measurement.
Changing Existing Forms of Measurement
Rather than just focusing on the transparency, disclosure and openness of public information, civil society groups can argue for changing what is measured with existing data infrastructures. One example of this is recent campaigning around company ownership in the UK. Advocacy groups wanted to unpick networks of corporate ownership and control in order to support their campaigning and investigations around tax avoidance, tax evasion and illicit financial flows. While the UK company register recorded information about “nominal ownership”, it did not include information about so-called “beneficial ownership”, or who ultimately benefits from the ownership and control of companies. Campaigners undertook an extensive programme of activities to advocate for changes and extensions to existing data infrastructures – including via legislation, software systems, and administrative protocols.2
Advocating New Forms of Measurement
As well as changing or recalibrating existing forms of measurement, campaigners and civil society organisations can make the case for the measurement of things which were not previously measured. For example, over the past several decades social and political campaigning has resulted in new indicators about many different issues – such as gender inequality, health, work, disability, pollution or education.3 In such cases activists aimed to establish a given indicator as important and relevant for public institutions, decision makers, and broader publics – in order to, for example, inform policy development or resource allocation.
Undertaking New Forms of Measurement
Historically, many civil society organisations and advocacy groups have collected their own data to make the case for action on issues that they work on – from human rights abuses to endangered species. Recently there have been several data journalism projects which highlight gaps in what is officially counted. The Migrant Files is an open database containing information about over 29,000 people who died on their way to Europe since 2000, collated from publicly available sources. It was created by a network of journalists who were concerned that this data was not being systematically collected by European institutions. In a similar vein The Counted project from The Guardian records information about deaths in police custody in the US, explicitly in response to the lack of official data collection on this topic. The Migrant Files

The Role of the Open Data Movement

The nascent open data movement has often focused on the release of pre-existing information about things which are already routinely measured by public institutions. Advocates have pushed for the release of datasets under open licenses in machine-readable formats to facilitate widespread re-use – whether to develop new applications and services, or to facilitate new forms of journalism and advocacy. Datasets are often published via data portals, of which there are now hundreds around the world at local, regional, national and supranational levels. As well as opening up new datasets, some public institutions have implemented mechanisms to gather input and feedback on open data release priorities, such as:
  • Advisory panels and user groups – e.g. as the UK’s Open Data User Group (ODUG);
  • Dedicated staff – e.g. community management or “Chief Data Officer” positions;
  • User engagement channels – e.g. social media accounts, forums and mailing lists;
  • Data request mechanisms – e.g. Data.gov.uk’s dataset request service or the EU Open Data Portal’s “Suggest a Dataset” form;
  • Consultation processes – e.g. Open Government Partnership National Action Plans;
  • Solicitation for input around data standards – e.g. the US’s Federal Spending Transparency issue tracker on GitHub.
In principle these kinds of mechanisms could be used not just to inform priorities for the release of existing datasets – but also in order to facilitate engagement between institutions and civil society actors around what should be measured by the public sector and how. To use a metaphor, if data can be compared to photography, then might the open data movement play a role in intervening not just around access and circulation of snapshots taken by public institutions, but also around what is depicted and how it is shot?

Questions for Discussion

We would like to catalyse discussion and gather input about how to increase civil society engagement around the data revolution and questions about what should be measured and how. To this end, we invite advocacy groups, journalists, public institutions, data users, researchers and others to respond to the following questions.
What Can Civil Society Groups Do?
  • What can civil society organisations do to engage with the data revolution?
  • What role might the nascent open data movement play in mediating between civil society organisations and public institutions around what should be measured?
  • What opportunities does the data revolution present for civil society organisations?
  • What are the best examples of democratic interventions to change, advocate or create new forms of measurement (both present and past)?
  • What are the biggest obstacles to greater civil society engagement with the data revolution? How might these be addressed?
  • Which kinds of transnational challenges and issues (e.g. climate change, tax base erosion) are currently inadequately dealt with by national data infrastructures?
  • What areas might new kinds of measurement make the biggest difference, and how?
  • What factors are most important in ensuring that data leads to action?
  • What might civil society groups do to flag potential risks and unwanted consequences of data infrastructures as well as their benefits?
What Can Public Institutions Do?
  • What can public institutions do to better understand the interests and priorities of civil society organisations around what should be measured?
  • Are there examples of where open data initiatives have facilitated significant changes to existing datasets, or the creation of new kinds of datasets?
  • Which kinds of mechanisms might be most effective in understanding and responding to the interests of civil society organisations around what is measured and how?
  • What are the biggest obstacles to public institutions responding more effectively to the data needs and interests of civil society groups? How might these be addressed?

How to Respond

We welcome responses on these and other topics via the channels below:

  1. In this context we understand data infrastructures as composites of technical, legal and social systems (e.g. software, laws, policies, practices, standards) involved in the creation and management of data. 
  2. See: Gray, J. & Davies, T (2015) “Fighting Phantom Firms in the UK: From Opening Up Datasets to Reshaping Data Infrastructures?”. Working paper available at: http://ssrn.com/abstract=2610937 
  3. See: Bruno, I., Didier, E., and Vitale, T. (eds) (2014) Statistics and Activism. Special issue of Partecipazione e conflitto. The Open Journal of Sociopolitical Studies. Available at: http://siba-ese.unisalento.it/index.php/paco/issue/view/1248