You are browsing the archive for Frictionless Data.

Data Curator – share usable open data

- March 14, 2019 in Frictionless Data, tools

Data Curator is a simple desktop editor to help describe, validate, and share usable open data.

Open data producers are increasingly focusing on improving open data so it can be easily used to create insight and drive positive change. Open data is more likely to be used if data consumers can:

  • understand the structure and quality of the data
  • understand why and how the data was collected
  • look up the meaning of codes used in the data
  • access the data in an open machine-readable format
  • know how the data is licensed and how it can be reused

Data Curator enables open data producers to define all this information, and validate the data, prior to publishing it on the Internet. The data is published as a Tabular Data Package following the Frictionless Data specification. This allows open data consumers to read the data using Frictionless Data applications and software libraries.

“We need to make it easy to manage data throughout its lifecycle and ensure it can be easily and reliably retrieved by people who want to reuse and repurpose it. We developed Data Curator to help publishers define certain characteristics to improve data and metadata quality” – Dallas Stower, Assistant Director-General, Digital Platforms and Data, Queensland Government – Project Sponsor

< p class="part" data-startline="21" data-endline="21">Data Curator allows you to create data from scratch or open an Excel or CSV file. Data Curator requires that each column of data is given a type (e.g. text, number). Data can be defined further using a format (e.g. text may be a URL or email). Constraints can be applied to data values (e.g. required, unique, minimum value, etc.). This definition process can be accelerated by using the Guess feature, that guesses the data types and formats for all columns.

Data can be validated against the column type, format and constraints to identify and correct errors. If it’s not appropriate to correct the errors, they can be added to the provenance information to help people understand why and how the data was collected and determine if it is fit for their purpose.

Data Curator screenshot

Often a set of codes used in the data is defined in another table. Data Curator lets you validate data across tables. This is really useful if you want to share a set of standard codes across different datasets or organisations.

Data Curator lets you save data as a comma, semicolon, or tab separated value file. After you’ve applied an open license to the data, you can export a data package containing the data, its description, and provenance information. The data package can then be published to the Internet. Some open data platforms support uploading, displaying, and downloading data packages. Open data consumers can then confidently access and use quality open data.

Get Started

Download Data Curator for Windows or macOS.

Learn more about Data Curator and Frictionless Data.

Who made Data Curator?

Data Curator was made possible with funding and guidance from the Queensland Government.

The project was led by Stephen Gates from the ODI Australian Network. Software development made possible by Gavin Kennedy and Matt Mulholland from the Queensland Cyber Infrastructure Foundation (QCIF).

Data Curator uses the Frictionless Data software libraries maintained by Open Knowledge International. Data Curator started life as Comma Chameleon an experiment by the Open Data Institute.

Open Knowledge Internacional anuncia fundo para ferramenta de Frictionless Data

- February 21, 2019 in Dados Abertos, dados sem atrito, Data Package, Destaque, Frictionless Data, mini-grant, Open Knowledge Internacional

A Open Knowledge Internacional está lançando o Frictionless Data Tool Fund, um esquema de mini-bolsas que oferece US$5.000 para apoiar indivíduos ou organizações no desenvolvimento de uma ferramenta open source para pesquisa ou ciência reprodutível a partir das especificações e software do projeto Frictionless Data. A organização recebe inscrições até o dia 30 de abril de 2019. O Fundo de Ferramentas faz parte do projeto Frictionless Data for Reproducible Research da Open Knowledge Internacional. Este projeto, financiado pela Fundação Sloan, aplica o trabalho em dados sem atrito a disciplinas de pesquisa orientadas por dados, a fim de facilitar fluxos de trabalho de dados reprodutíveis em contextos de pesquisa. Em sua essência, o Frictionless Data é um conjunto de especificações para interoperabilidade de dados e metadados, acompanhado por uma coleção de bibliotecas de software que implementam essas especificações e uma série de práticas recomendadas para o gerenciamento de dados. A especificação principal, o Data Package, é um “contêiner” simples e prático para dados e metadados. Com esse anúncio, estamos procurando indivíduos ou organizações de cientistas, pesquisadores, desenvolvedores ou organizadores de dados para aproveitar nossas ferramentas e código-fonte existentes de software livre para criar novas ferramentas para pesquisa reprodutível. O fundo estará aceitando submissões até o final de abril de 2019 para trabalhos que serão concluídos até o final do ano. Isso se baseia no sucesso do primeiro fundo de ferramentas em 2017, que financiou a criação de bibliotecas para especificações Frictionless Data em diversas linguagens de programação adicionais. Para o Fundo de Ferramentas deste ano, gostaríamos que a comunidade trabalhasse em projetos que possam fazer diferença para pesquisadores e cientistas. As candidaturas podem ser submetidas preenchendo este formulário até 30 de abril de 2019. A equipe da Frictionless Data notificará todos os candidatos se eles obtiveram sucesso ou não até o final de maio. Os candidatos aprovados serão então convidados para entrevistas antes da decisão final ser dada. A escolha será baseada em evidências de capacidades técnicas e também serão favorecidos os candidatos que demonstrarem interesse no uso prático das especificações de Frictionless Data. Também será dada preferência a candidatos que demonstrem interesse em trabalhar e manter essas ferramentas daqui para frente. Para mais perguntas sobre o fundo, fale diretamente com a Open Knowledge Internacional no fórum, no Gitter chat ou envie um email para Flattr this!

Announcing the Frictionless Data Tool Fund

- February 18, 2019 in Frictionless Data

Warming up to csv,conf.v4

- February 1, 2019 in #CSVconf, Events, Frictionless Data

On May 8 and 9 2019, the fourth version of csv,conf is set to take place at Eliot Center in Portland, Oregon, United States. csv,conf is a community conference bringing together diverse groups to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source. Over two days, attendees will have the opportunity to hear about ongoing work, share skills, exchange ideas (and stickers!) and kickstart collaborations. This year, our keynotes include Teon L. Brooks, a data scientists at Mozilla, and Kirstie Whitaker, a research fellow at the Alan Turing Institute, with more announcements to come soon. If you would like to share your work, submissions for session proposals for our 25-minute talk slots are open from now until end of day, February 9, 2019. When csv,conf first launched in July 2014 as a conference for data makers everywhere, it adopted the comma-separated-values format in its branding metaphorically. However, as a data conference that brings together people from different disciplines and domains, conversations and anecdotes shared at csv,conf are not limited to the CSV file format. We are keen on getting as many people as possible to csv,conf,v4, and the conference will award travel grants to subsidize travel and associated costs for interested parties that lack the resources and support to get them to Portland. To that end, we have set up our honor-system, conference ticketing page on Eventbrite. We encourage you to get your conference tickets as soon as possible, keeping in mind that as a non-profit and community-run conference, proceeds from ticket sales will help cover our catering and venue costs in addition to offering travel support for speakers and attendees where needed. Additionally, Open Knowledge International will host a community event during the main csv,conf meeting where you can learn more about our Network and catch up with what the community has been doing. From the work on data literacy with School of Data, to the community involved on Open Data Day and initiatives on OpenGLAM, personal data and open education, we want to share with you the state of open knowledge in our Network.  We will be announcing more details about our community event soon! From the first three conferences held in the last four years, csv,conf has brought together over 500 participants from 30 countries. More than 300 talks spanning over 180 hours have been presented, packaged and shared on our YouTube channel. Many post-conference narratives and think pieces, as well as interdisciplinary collaborations have also surfaced from previous conferences. This is only part of the story, and we can’t wait to see and hear from you in Portland in May, and are excited for all that awaits! Csv,conf,v4 is supported by the Sloan Foundation through OKIs Frictionless Data for Reproducible Research grant, and the Frictionless Data team is part of the conference committee. We are happy to answer all questions you may have or offer any clarifications if needed. Feel free to reach out to us on

The commallama at csv,conf,v3 will return in this year!

Introducing our new Product Manager for Frictionless Data

- November 5, 2018 in Frictionless Data, Open Science

Earlier this year OKI announced new funding from  The Alfred P. Sloan Foundation to explore “Frictionless Data for Reproducible Research”. Over the next three years we will be working closely with researchers to support the way they are using data with the Frictionless Data software and tools. The project is delighted to announce that Lilly Winfree has come on board as Product Manager to work with research communities on a series of focussed pilots in the research space and to help us develop focussed training and support for researchers. Data practices in scientific research are transforming as researchers are facing a reproducibility revolution; there is a growing push to make research data more open, leading to more transparent and reproducible science. I’m really excited to join the team at OKI, whose mission of creating a world where knowledge creates power for the many, not the few really resonates with me and my desires to make science more open. During my grad school years as a neuroscience researcher, I was often frustrated with “closed” practices (inaccessible data, poorly documented methods, paywalled articles) and I became an advocate for open science and open data. While investigating brain injury in fruit flies (yes, fruit fly brains are actually quite similar to human brains!), I taught myself coding to analyse and visualise my research data. After my PhD research, I worked on integrating open biological data with the Monarch Initiative, and delved into the open data licensing world with the Reusable Data Project. I am excited to take my passion for open data and join OKI to work on the Frictionless Data project, where I will get to go back to my scientific research roots and work with researchers to make their data more open, shareable, and reproducible. Most people that use data know the frustrations of missing values, unknown variables, and confusing schema (just to name a few). This “friction” in data can lead to massive amounts of time being spent on data cleaning, with little time left for analysis. The Frictionless Data for Reproducible Research project will build upon years of work at OKI focused on making data more structured, discoverable, and usable.  The core of Frictionless Data is the data preparation and validation stages, and the team has created specifications and tooling centered around these steps. For instance, the Data Package Creator packages tabular data with its machine readable metadata, allowing users to understand the data structure, meaning of values, how the data was created, and the license. Also, users can validate their data for structure and content with Goodtables, which reduces errors and increases data quality. By creating specifications and tooling and promoting best practices, we are aiming to make data more open and more easily shareable among people and between various tools. For the next stage of the project, I will be working with organisations on pilots with researchers to work on reducing the friction in scientists’ data. I will be amassing a network of researchers interested in open data and open science, and giving trainings and workshops on using the Frictionless Data tools and specs. Importantly, I will work with researchers to integrate these tools and specs into their current workflows, to help shorten the time between experiment → data → analysis → insight. Ultimately, we are aiming to make science more open, efficient, and reproducible. Are you a researcher interested in making your data more open? Do you work in a research-related organization and want to collaborate on a pilot? Are you an open source developer looking to build upon frictionless tools? We’d love to chat with you! We are eager to work with scientists from all disciplines.  If you are interested, connect with the project team on the public gitter channel, join our community chat, or email Lilly at!

Lilly in the fruit fly lab


Frictionless Data and FAIR Research Principles

- August 14, 2018 in Data Package, Frictionless Data

In August 2018, Serah Rono will be running a Frictionless Data workshop in CopenHagen, congregated by the Danish National Research Data Management Forum as part of the FAIR Across project. In October 2018, she will also run a Frictionless Data workshop at FORCE11 in Montreal, Canada. Ahead of the two workshops, and other events before the close of 2018, this blog post discusses how the Frictionless Data initiative aligns with FAIR research principles. An integral part of evidence-based research is gathering and analysing data, which takes time and often requires skill and specialized tools to aid the process. Once the work is done, reproducibility requires that research reports be shared with the data and software from which insights are derived and conclusions are drawn, if at all.  Widely lauded as a key measure of research credibility, reproducibility also makes a bold demand for openness by default in research, which in turn fosters collaboration. FAIR (findability, accessibility, interoperability and reusability) research principles are central to the open access and open research movements.
FAIR Guiding Principles precede implementation choices, and do not suggest any specific technology, standard, or implementation-solution; moreover, the Principles are not, themselves, a standard or a specification. They act as a guide to data publishers and stewards to assist them in evaluating whether their particular implementation choices are rendering their digital research artefacts Findable, Accessible, Interoperable, and Reusable.”

Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data3:160018 doi: 10.1038/sdata.2016.18 (2016)

Data Packages in Frictionless Data as an example of FAIRness

Our Frictionless Data project aims to make it effortless to transport high quality data among different tools & platforms for further analysis. The Data Package format is at the core of Frictionless Data, and it makes it possible to package data and attach contextual information to it before sharing it.

An example data package

Data packages are nothing without the descriptor file. This descriptor file is made available in a machine readable format, JSON, and holds metadata for your collection of resources, and a schema for your tabular data.


In Data Packages, pieces of information are called resources. Each resource is referred to by name and has a globally unique identifier, with the provision to reference remote resources by URLs. Resource names and identifiers are held alongside other metadata in the descriptor file.


Since metadata is held in the descriptor file, it can be accessed separately from associated data. Where resources are available online – in an archive or data platform – sharing the descriptor file only is sufficient and data provenance is guaranteed for all associated resources.


The descriptor file is saved as a JSON file, a machine-readable format that can be processed with great ease by many different tools during data analysis. The descriptor file uses accessible and shared language, and has provision to add descriptions, and information on sources and contributors for each resource, which makes it possible to link to other existing metadata and guarantee data provenance. It is also very extensible, and can be expanded to accommodate additional information as needed.


Part of the metadata held in a data package includes licensing and author information, and has a requirement to link back to original sources thus ensuring data provenance. This serves as a great guide for users interested in your resources. Where licensing allows for resources to be archived on different platforms, this means that regardless of where users access this data from, they will be able to trace back to original sources of the data as needed. For example, all countries of the world have unique codes attached to them. See how the Country Codes data package is represented on two different platforms:  GitHub, and on DataHub. With thanks to SLOAN Foundation for the new Frictionless Data For Reproducible Research grant, we will be running deep dive workshops to expound on these concepts and identify areas for improvement and collaboration in open access and open research. We have exciting opportunities in store, which we will announce in our community channels over time.

Bonus readings

Here are some of the ways researchers have adopted Frictionless Data software in different domains over the last two years:
  • The Cell Migration and Standardisation Organisation (CMSO) uses Frictionless Data specs to package cell migration data and load it into Pandas for data analysis and creation of visualizations. Read more.
  • We collaborated with Data Management for TEDDINET project (DM4T) on a proof-of-concept pilot in which we used Frictionless Data software to address some of the data management challenges faced by DM4T. Read more.
  • Open Power System Data uses Frictionless Data specifications to avail energy data for analysis and modeling. Read more.
  • We collaborated with Pacific Northwest National Laboratory – Active Data Biology and explored use of Frictionless Data software to generate schema for tabular data and check validity of metadata stored as part of a biological application on GitHub. Read more.
  • We collaborated with the UK Data service and used Frictionless Data software to assess and report on data quality, and made a case for generating visualisations with ensuing data and metadata. Read more.
Our team is also scheduled to run Frictionless Data workshops in the coming months:
  • In CopenHagen, congregated by the Danish National Research Data Management Forum as part of the FAIR Across project, in August 2018.
  • In Montreal, Canada, at FORCE11 between October 10 and 12, 2018. See the full program here and sign up here to attend the Frictionless Data workshop.

Sloan Foundation Funds Frictionless Data for Reproducible Research

- July 12, 2018 in data infrastructures, Featured, Frictionless Data

We are excited to announce that Open Knowledge International has received a grant of $750,000 from The Alfred P. Sloan Foundation for our project “Frictionless Data for Reproducible Research”. The new funding from Sloan enables us to continue work over the next 3 years via enhanced dissemination and training activities, as well as further iteration on the software and specifications via a range of deep pilot projects with research partners.  
Frictionless Data, we focus specifically on reducing friction around discoverability, structure, standardization and tooling. More generally, the technicalities around the preparation, validation and sharing of data, in ways that both enhance existing workflows and enable new ones, towards the express goal of minimizing the gap between data and insight. We do this by creating specifications and software that are primarily informed by reuse (of existing formats and standards), conceptual minimalism, and platform-agnostic interoperability. Over the last two years, with support from Sloan and others, we have validated the utility and usefulness of the Frictionless Data approach for the research community and found strong commonalities between our experiences of data work in the civic tech arena, and the friction encountered in data-driven research. The pilots and case studies we conducted over this period have enabled us to improve our specifications and software, and to engage with a wider network of actors interested in data-driven research from fields as diverse as earth science, computational biology, archeology, and the digital humanities. Building on work going on for nearly a decade, last September we launched v1 of the Frictionless Data specifications, and we have produced core software that implements those specifications across 7 programming languages. With the new grant we will iterate on this work, as well as run additional Tool Fund activities to facilitate deeper integration of the Frictionless Data approach in a range of tools and workflows that enable in reproducible research. A core point of friction in working with data is the discoverability of data. Having a curated collection of well-maintained datasets that are of high value to a given domain of inquiry is an important move towards increasing quality of data-driven research. With this in mind, we will also be organising efforts to curate datasets that are of high-value in the domains we work. This high-value data will serve as a reference for how to package data with Frictionless Data specifications, and provide suitable material for producing domain-specific training materials and guides. Finally, we will be focussing on researchers themselves and are planning a programme to recruit and train early career researchers to become trainers and evangelists of the tools in their field(s). This programme will draw lessons from years of experience running data literacy fellowships with School of Data and Panton Fellowships for OpenScience. We hope to meet researchers where they are and work with them to demonstrate the effectiveness of our approach and how our tools and bring real value to your work. Are you a researcher looking for better tooling to manage your data? Do you work at or represent an organization working on issues related to research and would like to work with us on complementary issues for which data packages are suited? Are you a developer and have an idea for something we can build together? Are you a student looking to learn more about data wrangling, managing research data, or open data in general? We’d love to hear from you.  If you have any other questions or comments about this initiative, please visit this topic in our forum,  hashtag #frictionlessdata or speak to the project team on the public gitter channel.   The Alfred P. Sloan Foundation is a philanthropic, not-for-profit grant-making institution based in New York City. Established in 1934 by Alfred Pritchard Sloan Jr., then-President and Chief Executive Officer of the General Motors Corporation, the Foundation makes grants in support of original research and education in science, technology, engineering, mathematics and economic performance.  

Improving your data publishing workflow with the Frictionless Data Field Guide

- March 27, 2018 in data infrastructures, Data Quality, Frictionless Data

The Frictionless Data Field Guide provides step-by-step instructions for improving data publishing workflows. The field guide introduces new ways of working informed by the Frictionless Data suite of software that data publishers can use independently, or adapt into existing personal and organisational workflows. Data quality and automation of data processing are essential in creating useful and effective data publication workflows. Speed of publication, and lowering costs of publication, are two areas that are directly enhanced by having better tooling and workflows to address quality and automation. At Open Knowledge International, we think that it is important for everybody involved in the publication of data to have access to tools that help automate and improve the quality of data, so this field guide details open data publication approaches with a focus on user-facing tools for anyone interested in publishing data. All of the Frictionless Data tools that are included in this field guide are built with open data publication workflows in mind, with a focus on tabular data, and there is a high degree of flexibility for extended use cases, handling different types of open data. The software featured in this field guide are all open source, maintained by Open Knowledge International under the Frictionless Data umbrella and designed to be modular. The preparation and delivery of the Frictionless Data Field Guide  has been made possible by the Open Data Institute, who received funding from Innovate UK to build “data infrastructure, improve data literacy, stimulate data innovation and build trust in the use of data” under the pubtools programme. Feel free to engage the Frictionless Data team and community on Gitter. The Frictionless Data project is a set of simple specifications to address common data description and data transport issues. The overall aim is to reduce friction in working with data and to do this by making it as easy as possible to transport data between different tools and platforms for further analysis. At the heart of Frictionless Data is the Data Package, which is a simple format for packaging data collections together with a schema and descriptive metadata. For over ten years, the Frictionless Data community has iterated extensively on tools and libraries that address various causes of friction in working with data, and this work culminated in the release of v1 specifications in September 2017.  

Open Belgium 2018: “Open Communities – Smart Society”

- February 14, 2018 in Frictionless Data, OK Belgium, Open Belgium

The next edition of Open Belgium, a community driven conference organised by Open Knowledge Belgium, is almost here! In less than 4 weeks, 300 industry, research, government and citizen stakeholders will gather and discuss current trends around Open Knowledge and Open Data in Belgium. Open Belgium is the ideal place to get an update on local, national and global open initiatives as well as to share skills, expertise and ideas with like minded data enthusiasts. It is an event where IT-experts, local authorities, Open Data hackers, researchers and private companies have the chance to catch up on what is new in the field of Open Knowledge in Belgium and beyond. It’s a day where data publishers sit next to users, citizen developers and communities to network and to openly discuss the next steps in Open Knowledge and Open Data. To make sure that you will get the best out of a full day of talks, workshops, panels, discussions and, not to forget, networking opportunities, we post daily blog posts of all that is going to happen on the 12th of March. Check out the full programme here. From Open Knowledge International, Serah Rono (Developer Advocate) and Vitor Baptista (Engineering Lead) will host the hackathon session “Using Frictionless Data software to turn data into insight”. OKI’s Frictionless Data ( initiative is about making it effortless to transport quality data among different tools & platforms for further analysis. In this session, they will introduce Open Belgium community to software that streamlines their data workflow process and make a case for data quality. You will learn how to add metadata and create schema for their data, validate datasets and be part of a vibrant open source, open data community. Do you want to be part of the open community? Attend talks from excellent speakers? Meet other open experts and interested peers? Find inspiration for your projects? Or just keep the discussion going on #OpenBelgium? Be sure to join on the 12h of March in Louvain-la-Neuve: there are still tickets left here.

Validation for Open Data Portals: a Frictionless Data Case Study

- December 18, 2017 in case study, ckan, Data Quality, Frictionless Data, goodtables

The Frictionless Data project is about making it effortless to transport high quality data among different tools and platforms for further analysis. We are doing this by developing a set of software, specifications, and best practices for publishing data. The heart of Frictionless Data is the Data Package specification, a containerization format for any kind of data based on existing practices for publishing open-source software. Through its pilots, Frictionless Data is working directly with organisations to solve real problems managing data. The University of Pittsburgh’s Center for Urban and Social Research is one such organisation. One of the main goals of the Frictionless Data project is to help improve data quality by providing easy to integrate libraries and services for data validation. We have integrated data validation seamlessly with different backends like GitHub and Amazon S3 via the online service, but we also wanted to explore closer integrations with other platforms. An obvious choice for that are Open Data portals. They are still one of the main forms of dissemination of Open Data, especially for governments and other organizations. They provide a single entry point to data relating to a particular region or thematic area and provide users with tools to discover and access different datasets. On the backend, publishers also have tools available for the validation and publication of datasets. Data quality varies widely across different portals, reflecting the publication processes and requirements of the hosting organizations. In general, it is difficult for users to assess the quality of the data and there is a lack of descriptors for the actual data fields. At the publisher level, while strong emphasis has been put in metadata standards and interoperability, publishers don’t generally have the same help or guidance when dealing with data quality or description. We believe that data quality in Open Data portals can have a central place on both these fronts, user-centric and publisher-centric, and we started this pilot to showcase a possible implementation. To field test our implementation we chose the Western Pennsylvania Regional Data Center (WPRDC), managed by the University of Pittsburgh Center for Urban and Social Research. WPRDC is a great example of a well managed Open Data portal, where datasets are actively maintained and the portal itself is just one component of a wider Open Data strategy. It also provides a good variety of publishers, including public sector agencies, academic institutions, and nonprofit organizations. The portal software that we are using for this pilot is CKAN, the world leading open source software for Open Data portals (source). Open Knowledge International initially fostered the CKAN project and is now a member of the CKAN Association. We created ckanext-validation, a CKAN extension that provides a low level API and readily available features for data validation and reporting that can be added to any CKAN instance. This is powered by goodtables, a library developed by Open Knowledge International to support the validation of tabular datasets. The ckanext-validation extension allows users to perform data validation against any tabular resource, such as  CSV or Excel files. This generates a report that is stored against a particular resource, describing issues found with the data, both at the structural level, such as missing headers and blank rows,  and at the data schema level, such as wrong data types and  out of range values. Read the technical details about this pilot study, our learnings and areas we have identified for further work in the coming days here on the Frictionless Data website.