You are browsing the archive for Frictionless Data.

Thank you for Joining the Frictionless Data Hackathon

- October 21, 2021 in Frictionless Data, Open Knowledge

Last week over 20 people from around the world joined the Frictionless Data team for the world’s first Frictionless Data Hackathon. Find out what happened, and make sure you sign up for the next one.
Watch video here What’s this about? The team at Open Knowledge Foundation have lots of experience running and attending Hackathons. We know how powerful they can be to create new functioning software and useful innovations in a short space of time. This is why the team at Frictionless Data were so excited to launch the first Frictionless Data Hackathon on 7 – 8th October 2021. Over 20 people from around the world signed up for the event. During two full days, the participants worked on four projects, all with very different outcomes. For example:
  • Covid Tracker was aimed at testing Livemark – the latest Frictionless tool – with real live data to provide an example of all its functionalities. Check out the project Github repository to learn more.
  • the Frictionless Tutorial project created new tutorials using the Python Frictionless Framework (see tutorial here)
  • Frictionless Community Insight focused on building a new Livemark website to tell the story of the Frictionless Community – who we are, where we are from, what we do and what we care about (see draft website here)
  • DPCKAN was a project proposed by a team working on the data portal of the state of Minas Gerais in Brazil to develop a tool that would allow publishing and updating datasets described with Frictionless Standards in a CKAN instance. Check out the Github Repository here.
The prize for the best project, voted by the participants, went to the DPCKAN team. Well done André, Andrés, Carolina, Daniel, Francisco and Gabriel!
    ”I feel pretty happy after this frictionless hackathon experience. We’ve grown in 2 days more than it could have been possible in one month. The knowledge and experience exchange was remarkable”, said the winning team.
Find out more
You can learn more about the Frictionless Data Hackathon here and watch the project presentations here. Learn more about Frictionless Data on our website Ask us a question, or join the Frictionless Data community here.

Open Science in action ! Making it easier for science researchers to share their data

- August 16, 2021 in Frictionless Data

Frictionless Data and Dryad join forces to make it easier for scientists to upload their research data to the Dryad repository. What’s this about? What happens to scientific data after it is created ? Is it shared with other researchers? Or is it hidden away on a private hard drive? This question is at the heart of the Open Science movement – which aims to make research more accessible and usable by everyone – so that advances in scientific understanding happen faster.
A great way to share research data is to upload it to a repository, but simply uploading data is not enough. Sometimes uploaded data is not high quality. The data may have errors in it, or bits missing. Another problem is metadata – is there enough descriptive information that other researchers can also use it? By collaborating with Dryad, the Frictionless team aimed to fix these problems – making it easier for science researchers to share their data, and drive scientific innovation faster ! Find out more about this collaboration by visiting the Frictionless Blog or contact the Frictionless team here. What is Dryad? Dryad is a community-led data repository that allows researchers to submit data from any field of science. The data has to be curated to ensure the quality of the data, and to make sure it has comprehensive metadata to allow reuse by other researchers. Visit the Dryad website here. What did we achieve? The outcome of this collaboration is a revamped upload page for the Dryad application.
Researchers uploading tabular data (CSV, XLS, XLSX) under 25MB will have the files automatically validated using the Frictionless tool. These checks are based on the built-in validation of Frictionless Framework (read the validation guide here), and include checking for data errors such as blank cells, missing headers, or incorrectly formatted data. The Frictionless report will help guide researchers on which issues should be resolved, allowing researchers to edit and re-upload files before submitting their dataset for curation and publication. = = = = = = = = If there’s a problem, Yo, we’ll solve it !
    Do you want to better apply open scientific methods to your research programme? Are you interested in learning how Frictionless Data can make it easier for your team to share data with people around the world?
Get in touch with the Frictionless Data Team here Also, check out other organisations that have incorporated Frictionless data into their systems to improve their data productivity. = = = = = = = This work was funded by the Sloan Foundation as part of the Frictionless Data for Reproducible Research project.

Unveiling the new Frictionless Data documentation portal

- April 14, 2021 in Frictionless Data

Have you used Frictionless Data documentation in the past and been confused or wanted more examples? Are you a brand new Frictionless Data user looking to get started learning?  We invite you all to visit our new and improved documentation portal. Thanks to a fund that the Open Knowledge Foundation was awarded from the Open Data Institute, we have completely reworked the guides of our Frictionless Data Framework website according to the suggestions from a cohort of users gathered during several feedback sessions throughout the months of February and March.  We cannot stress enough how precious those feedback sessions have been to us. They were an excellent opportunity to connect with our users and reflect together with them on how to make all our guides more useful for current and future users. The enthusiasm and engagement that the community showed for the process was great to see and reminded us that the link with the community should be at the core of open source projects. We were amazed by the amount of extremely useful inputs that we got. While we are still digesting some of the suggestions and working out how to best implement them, we have made many changes to make the documentation a smoother, Frictionless experience.

So what’s new?

A common theme from the feedback sessions was that it was sometimes difficult for novice users to understand the whole potential of the Frictionless specifications. To help make this clearer, we added a more detailed explanation, user examples and user stories to our Introduction. We also added some extra installation tips and a troubleshooting section to our Quick Start guide. The users also suggested several code changes, like more realistic code examples, better explanations of functions, and the ability to run code examples in both the Command Line and Python. This last suggestion was prompted because most of the guides use a mix of Command Line and Python syntax, which was confusing to our users. We have clarified that by adding a switch in the code snippets that allows user to work with a pure Python Syntax or pure Command Line (when possible), as you can see here. We also put together an FAQ section based on questions that were often asked on our Discord chat. If you have suggestions for other common questions to add, let us know!

The documentation revamping process also included the publication of new tutorials. We worked on two new Frictionless tutorials, which are published under the Notebooks link in the navigation menu. While working on those, we got inspired by the feedback sessions and realised that it made sense to give our community the possibility to contribute to the project with some real life examples of Frictionless Data use. The user selection process has started and we hope to get the new tutorials online by the end of the month, so stay tuned!

What’s next?

Our commitment to continually improving our documentation is not over with this project coming to an end! Do you have suggestions for changes you would like to see in our documentation? Please reach out to us or open a pull request to contribute. Everyone is welcome to contribute! Learn how to do it here.

Thanks, thanks, thanks!

Once again, we are very grateful to the Open Data Institute for giving us the chance to focus on this documentation in order to improve it. We cannot thank enough all our users who took part in the feedback sessions. Your contributions were precious.

More about Frictionless Data

Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

How we are improving the quality and interoperability of Frictionless Data

- February 25, 2021 in Frictionless Data

As we announced in January, the Open Knowledge Foundation has been awarded funds from the Open Data Institute to improve the quality and interoperability of Frictionless Data. We are halfway through the process of reviewing our documentation and adding new features to Frictionless Data, and wanted to give a status update showing how this work is improving the overall Frictionless experience. We have already done four feedback sessions and have been  delighted to meet 16  users from very diverse backgrounds and different levels of expertise using Frictionless Data, some of whom we knew and some not. In spite of the variety of users, it was very interesting to see a widespread consensus on the way the documentation can be improved (have a look here).  We are very grateful to all the Frictionless Data users who took part in our sessions – they helped us see all of our guides with fresh eyes. It was very important for us to do this review together with the Frictionless Data community because they are (together with those to come) the one who will benefit from it, so are the best placed to flag issues and propose changes. Every comment is being carefully reviewed at the moment and the new documentation will soon be released. What are the next steps?
  • We are going to have 8 to 12 more users giving us feedback in the coming month. 
  • We are also adding a FAQ section based on the questions we got from our users in the past.
If you have any feedback and/or improvement suggestions, please let us know on our Discord channel or on Twitter. More about Frictionless Data Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

Partnering with ODI to improve Frictionless Data

- January 12, 2021 in Frictionless Data, funding, News, Open Data

In the framework of  the Open Data Institute’s fund to develop open source tools for data institutions, the Open Knowledge Foundation (OKF) has been awarded funds to improve the quality and interoperability of Frictionless Data. In light of our effort to make data open and accessible, we are thrilled to announce we will be partnering with the Open Data Institute (ODI) to improve our existing documentation and add new features on Frictionless Data to create a better user experience for all.  To achieve this, we will be working with a cohort of users from our active and engaged community to create better documentation that fits their needs. Our main goal is to make it easier for current and future users to understand and make use of the Frictionless Data tools and data libraries to their fullest potential. We know how frustrating it can be to try and use existing code (or learn new code) that has incomplete documentation and we don’t want that to be a barrier for our users anymore. This is why we are very grateful to the ODI for granting us the opportunity to improve upon our existing documentation. 

So, what will be changing?

  • We will have a new project overview section, to help our users understand how to use Frictionless Data for their specific needs.
  • We will improve the existing documentation, to make sure even brand new users can quickly understand everything.
  • We will have Tutorials, to showcase real users experience and have user-friendly examples.
  • We will add a FAQ session.

And when will all of that be ready?

Very soon! By the beginning of April everything will be online, so stay tuned (and frictionless)!

Call for user feedback

Feedback from our community is crucial to us, and part of this grant will be used to fund an evaluation of the existing documentation by our users in the format of user feedback sessions. Are you using our Frictionless Data tools or our Python data library? Then we want to hear from you! We are currently looking for novice and intermediate users to help us review our documentation, in order to make it more useful for you and all our future users. For every user session you take part into, you will be given £50 for your time and feedback. Are you interested? Then fill in this form.

More about Frictionless Data

Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

Dryad and Frictionless Data collaboration

- November 18, 2020 in Frictionless Data

By Tracy Teal; originally posted in the Dryad blog: Guided by our commitment to make research data publishing more seamless and also re-usable, we are thrilled to partner with Open Knowledge Foundation and the Frictionless Data team to enhance our submission processes. Integrating the Frictionless Data toolkit, Dryad will be able to directly provide feedback to authors on the structure of tabular files uploaded. This will also allow for automated file level metadata to be created at upload and available for download for published datasets. We are excited to get moving on this project and with support from the Sloan Foundation, Open Knowledge Foundation has just announced a job opening to contribute to this work. Please check out the posting and circulate it to any developers who may be interested in building out this functionality with us:

Announcing the New Frictionless Framework

- October 8, 2020 in Frictionless Data

By Evgeny Karev & Lilly Winfree

Frictionless Framework

We are excited to announce our new high-level Python framework, frictionless-py: Frictionless-py was created to simplify overall user-experience for working with Frictionless Data in Python. It provides several high-level improvements in addition to many low-level fixes. Read more details below, or watch this intro video by Frictionless developer Evgeny:  

Why did we write new Python code?

Frictionless Data has been in development for almost a decade, with global users and projects spanning domains from science to government to finance. However, our main Python libraries (datapackage-py, goodtables-py, tableschema-py, tabulator-py) were originally built with some inconsistencies that have confused users over the years. We had started redoing our documentation for our existing code, and realized we had a larger issue on our hands – mainly that the disparate Python libraries had overlapping functionalities and we were not able to clearly articulate how they all fit together to form a bigger picture. We realized that overall, the existing user experience was not where we wanted it to be. Evgeny, the Frictionless Data technical lead developer, had been thinking about ways to improve the Python code for a while, and the outcome of that work is frictionless-py.

What happens to the old Python code (datapackage-py, goodtables-py, tableschema-py, tabulator-py)? How does this affect current users?

Datapackage-py (see details), tableschema-py (see details), tabulator-py (see details) still exist, will not be altered, and will be maintained. If your project is using this code, these changes are not breaking and there is no action you need to take at this point. However, we will be focusing new development on frictionless-py, and encourage you to consider starting to experiment with or work with frictionless-py during the last months of 2020 and migrate to it starting from 2021 (here is our migration guide). The one important thing to note is that goodtables-py has been subsumed by frictionless-py (since version 3 of Goodtables). We will continue to bug-fix goodtables@2.x in this branch and it is also still available on PyPi as it was before. Please note that frictionless@3.x version’s API is not stable as we are continuing to work on it at the moment. We will release frictionless@4.x by the end of 2020 to be the first SemVer/stable version.

What does frictionless-py do?

Frictionless-py has four main functions for working with data: describe, extract, validate, and transform. These are inspired by typical data analysis and data management methods.  Describe your data: You can infer, edit and save metadata of your data tables. This is a first step for ensuring data quality and usability. Frictionless metadata includes general information about your data like textual description, as well as field types and other tabular data details. Extract your data: You can read your data using a unified tabular interface. Data quality and consistency are guaranteed by a schema. Frictionless supports various file protocols like HTTP, FTP, and S3 and data formats like CSV, XLS, JSON, SQL, and others. Validate your data: You can validate data tables, resources, and datasets. Frictionless generates a unified validation report, as well as supports a lot of options to customize the validation process. Transform your data: You can clean, reshape, and transfer your data tables and datasets. Frictionless provides a pipeline capability and a lower-level interface to work with the data. Additional features: 
  • Powerful Python framework
  • Convenient command-line interface
  • Low memory consumption for data of any size
  • Reasonable performance on big data
  • Support for compressed files
  • Custom checks and formats
  • Fully pluggable architecture
  • The included API server
  • More than 1000+ tests

How can users get started?

We recommend that you begin by reading the Getting Started Guide and the Introduction Guide. We also have in depth documentation for Describing Data, Extracting Data, Validating Data, and Transforming Data.

How can you give us feedback?

What do you think? Let us know your thoughts, suggestions, or issues by joining us in our community chat on Discord or by opening an issue in the frictionless-py repo:


Where’s the documentation?

Are you a new user? Start here: Getting Started & Introduction Guide Are you an existing user? Start here: Migration Guide The full list of documentation can be found here: 

What’s the difference between datapackage and frictionless?

In general, frictionless is our new generation software while tabulator/tableschema/datapackage/goodtables is our previous generation software. Frictionless has a lot of improvements over them. Please see this issue for the full answer and a code example:

I’ve spotted a bug – where do I report it?

Let us know by opening an issue in the frictionless-py repo: For tabulator/tableschema/datapackage issues, please use the corresponding issue tracker and we will triage it for you. Thanks!

I have a question – where do I get help?

You can ask us questions in our Discord chat and someone from the main developer team or from the community will help you. Here is an invitation link: We also have a Twitter account (@frictionlessd8a) and community calls where you can come meet the team and ask questions:

I want to help – how do I contribute?

Amazing, thank you! We always welcome community contributions. Start here ( and here ( and you can also reach out to Evgeny (@roll) or Lilly (@lwinfree) on GitHub if you need help.

Additional Links/Resources

An update from the 2020 Frictionless Tool Fund grantees

- September 30, 2020 in Frictionless Data

We are excited to share project updates from our 2020 Frictionless Data Tool Fund! Our five grantees are about half-way through their projects and have written updates below to share with the community. These grants have been awarded to projects using Frictionless Data to improve reproducible data workflows in various research contexts. Read on to find out what they have been working on and ways that you can contribute!

Carles Pina Estany: Schema Collaboration

The goal of the schema-collaboration tool fund is to create an online platform to enable data managers and researchers to collaborate on describing their data through writing Frictionless data package schemas. The basics can be seen and tested on the online instance of the platform: the data manager can create a package, assign data packages to researchers, add comments and send a link to the researchers which will use datapackage-ui to edit the package and save it, making it available for the data manager. The next steps are to add extra fields to datapackage-ui and to work on the integration between schema-collaboration and datapackage-ui to make maintenance easier. Carles also plans to have an output of the datapackage as a PDF to help data managers and researchers spot errors. Progress can be followed through the project Wiki and feedback would be welcome through Github issues. Read more about Carles’ project here:  

Simon Tyrrell: Frictionless Data for Wheat

As part of the Designing Future Wheat project, Simon and team have repositories containing a wide variety of heterogeneous data. They are trying to standardise how to expose these datasets and their associated metadata. The first of their portals stores its data in an iRODS ( repository. They have recently completed the additions to our web module, eirods-dav, that uses the files, folders and metadata stored within this repository to automatically generate the Data Packages for the datasets. The next step is to look at expanding the data that is added to the Data Packages and similarly automatically expose tabular data as Tabular Data Packages. The eirods-dav GitHub repository is at and any feedback or queries are very welcome. Read more about Simon’s project here:  

Stephen Eglen: Analysis of spontaneous activity patterns in developing neural circuits using Frictionless Data tools

Stephen and Alexander have been busy over the summer integrating the frictionless tools into a workflow for analysis electrophysiological datasets. They have written converters to read in their ASCII- and HDF5-based data and convert them to frictionless containers.  Along the way, they have given helpful feedback to the team about the core packages. They have settled on the python interface as the most feature rich implementation to work with.  Alexander has now completed his analysis of the data, and we are currently working on a manuscript to highlight our research findings. Read more about Stephen’s project here:  

Asura Enkhbayar: Metrics in Context

How much do we know about the measurement tools used to create scholarly metrics? While data models and standards are neither new nor uncommon to the scholarly space, “Metrics in Context” is all about the very apparatuses we use to capture the scholarly activity embedded in those metrics. In order to confidently use citations and altmetrics in research assessment or hiring and promotion decisions, we need to be able to provide standardized descriptions of the involved digital infrastructure and acts of capturing. Asura is currently refining the conceptual model for scholarly events in the digital space in order to be able to account for various types of activities (both traditional and alternative scholarly metrics). After a review of the existing digital landscape of scholarly infrastructure projects, he will dive into the implementation using Frictionless. You can find more details on the open roadmap on Github and feel free to submit questions and comments as issues! Read more about Asura’s project here:  

Nikhil Vats: Adding Data Package Specifications to InterMine’s im-tables

Nikhil is working with InterMine to add data package specifications to im-tables (a library to query biological data) so that users can export metadata along with query results. Right now, the metadata contains field names, their description links, types, paths, class description links and primary key(s). Nikhil is currently figuring out ways to get links for data sources, attribute descriptions and class descriptions from their fair terms (or description links). Next steps for the project include building the frontend for this feature in im-tables and getting the rest of required information like result file format (CSV, TSV, etc.) about data in the datapackage.json (metadata) file. You can contribute to this project by opening an issue here or reaching out at Read more about Nikhil’s project here:

Goodtables: Expediting the data submission and submitter feedback process

- September 16, 2020 in Frictionless Data

by Adam Shepherd, Amber York, Danie Kinkade, and Lilly Winfree This post, originally published on the BCO-DMO blog, describes the second part of our Frictionless Data Pilot collaboration.   Logos for Goodtables and BCO-DMO   Earlier this year, the Biological and Chemical Oceanography Data Management Office (BCO-DMO) completed a pilot project with the Open Knowledge Foundation (OKF) to streamline the data curation processes for oceanographic datasets using Frictionless Data Pipelines (FDP). The goal of this pilot was to construct reproducible workflows that transformed the original data submitted to the office into archive-quality, FAIR-compliant versions. FDP lets a user define an order of processing steps to perform on some data, and the project developed new processing steps specific to the needs of these oceanographic datasets. These ordered processing steps are saved into a configuration file that is then available to be used anytime the archived version of the dataset must be reproduced. The primary value of these configuration files is that they capture and make the curation process at BCO-DMO transparent. Subsequently, we found additional value internally by using FDP in three other areas. First, they made the curation process across our data managers much more consistent versus the ad-hoc data processing scripts they individually produced before FDP. Second, we found that data managers saved time because they could reuse pre-existing pipelines to process newer versions submitted for pre-existing datasets. Finally, the configuration files helped us keep track of what processes were used in case a bug or error was ever found in the processing code. This project exceeded our goal of using FDP on at least 80% of data submissions to BCO-DMO to where we now use it almost 100% of the time. As a major deliverable from BCO-DMO’s recent NSF award the office planned to refactor its entire data infrastructure using techniques that would allow BCO-DMO to respond more rapidly to technological change. Using Frictionless Data as a backbone for data transport is a large piece of that transformation. Continuing to work with OKF, both groups sought to continue our collaboration by focusing on how to improve the data submission process at BCO-DMO.  
Goodtables detects a duplication error

Goodtables noticed a duplicate row in an uploaded tabular data file.

  Part of what makes BCO-DMO a successful data curation office is our hands-on work helping researchers achieve compliance with the NSF’s Sample and Data Policy coming from their Ocean Sciences division. Yet, a steady and constant queue of data submissions means that it can take some weeks before our data managers can thoroughly review data submissions and provide necessary feedback to submitters. In response, BCO-DMO has been creating a lightweight web application for submitting data while ensuring such a tool preserves the easy experience of submitting data that presently exists. Working with OKF, we wanted to expedite the data review process by providing data submitters with as much immediate feedback as possible by using Frictionless Data’s GoodTables project. Through a data submission platform, researchers would be able to upload data to BCO-DMO and, if tabular, get immediate feedback from Goodtables about whether it was correctly formatted or any other quality issues existed. With these reports at their disposal, submitters could update their submissions without having to wait for a BCO-DMO data manager to review. For small and minor changes this saves the submitter the headache of having to wait for simple feedback. The goal is to catch submitters at a time where they are focused on this data submission so that they don’t have to return weeks later and reconstitute their headspace around these data again. We catch them when their head is in the game. Goodtables provides us a framework to branch out beyond simple tabular validation by developing data profiles. These profiles would let a submitter specify the type of data they are submitting. Is the data a bottle or CTD file? Does it contain latitude, longitude time or depth observations? These questions, optional for submitters to answer, would provide even further validation steps to get improved feedback immediately. For example, specifying that a file contains latitude or longitude columns could detect whether all values fall within valid bounds. Or that a depth column contains values above the surface. Or that the column pertaining to the time of an observation has inconsistent formatting across some of the rows. BCO-DMO can expand on this platform to continue to add new and better quality checks that submitters can use.
Goodtables detects incorrect longitudes

Goodtables noticed a longitude that is outside a range of -180 to 180. This happended because BCO-DMO recommends using decimal degrees format between -180 t0 180 and defined a Goodtables check for longitude fields.

Frictionless Data Monthly Virtual Hangout – 27 August

- August 24, 2020 in Frictionless Data

Join the Frictionless Data group for a virtual hangout on 27 August! These monthly hangouts are a casual opportunity to meet other Frictionless Data users and the main contributor team, ask questions, and learn about recent developments. We will spend extra time during this call discussing the newly-released Python code Frictionless-py and would love to hear any feedback! The hangout is scheduled to occur on 27th August 2020 at 5 pm BST / 4 PM UTC. This will be a 1-hour meeting where community members come together to discuss key topics in the data community. If you would like to attend the hangout, you can sign up for the event using this form. We hope to see you there! PS – you can follow the Frictionless Data project on Twitter at and we also have an events calendar at