You are browsing the archive for Frictionless Data.

How we are improving the quality and interoperability of Frictionless Data

- February 25, 2021 in Frictionless Data

As we announced in January, the Open Knowledge Foundation has been awarded funds from the Open Data Institute to improve the quality and interoperability of Frictionless Data. We are halfway through the process of reviewing our documentation and adding new features to Frictionless Data, and wanted to give a status update showing how this work is improving the overall Frictionless experience. We have already done four feedback sessions and have been  delighted to meet 16  users from very diverse backgrounds and different levels of expertise using Frictionless Data, some of whom we knew and some not. In spite of the variety of users, it was very interesting to see a widespread consensus on the way the documentation can be improved (have a look here).  We are very grateful to all the Frictionless Data users who took part in our sessions – they helped us see all of our guides with fresh eyes. It was very important for us to do this review together with the Frictionless Data community because they are (together with those to come) the one who will benefit from it, so are the best placed to flag issues and propose changes. Every comment is being carefully reviewed at the moment and the new documentation will soon be released. What are the next steps?
  • We are going to have 8 to 12 more users giving us feedback in the coming month. 
  • We are also adding a FAQ section based on the questions we got from our users in the past.
If you have any feedback and/or improvement suggestions, please let us know on our Discord channel or on Twitter. More about Frictionless Data Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

Partnering with ODI to improve Frictionless Data

- January 12, 2021 in Frictionless Data, funding, News, Open Data

In the framework of  the Open Data Institute’s fund to develop open source tools for data institutions, the Open Knowledge Foundation (OKF) has been awarded funds to improve the quality and interoperability of Frictionless Data. In light of our effort to make data open and accessible, we are thrilled to announce we will be partnering with the Open Data Institute (ODI) to improve our existing documentation and add new features on Frictionless Data to create a better user experience for all.  To achieve this, we will be working with a cohort of users from our active and engaged community to create better documentation that fits their needs. Our main goal is to make it easier for current and future users to understand and make use of the Frictionless Data tools and data libraries to their fullest potential. We know how frustrating it can be to try and use existing code (or learn new code) that has incomplete documentation and we don’t want that to be a barrier for our users anymore. This is why we are very grateful to the ODI for granting us the opportunity to improve upon our existing documentation. 

So, what will be changing?

  • We will have a new project overview section, to help our users understand how to use Frictionless Data for their specific needs.
  • We will improve the existing documentation, to make sure even brand new users can quickly understand everything.
  • We will have Tutorials, to showcase real users experience and have user-friendly examples.
  • We will add a FAQ session.

And when will all of that be ready?

Very soon! By the beginning of April everything will be online, so stay tuned (and frictionless)!

Call for user feedback

Feedback from our community is crucial to us, and part of this grant will be used to fund an evaluation of the existing documentation by our users in the format of user feedback sessions. Are you using our Frictionless Data tools or our Python data library? Then we want to hear from you! We are currently looking for novice and intermediate users to help us review our documentation, in order to make it more useful for you and all our future users. For every user session you take part into, you will be given £50 for your time and feedback. Are you interested? Then fill in this form.

More about Frictionless Data

Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation.

Dryad and Frictionless Data collaboration

- November 18, 2020 in Frictionless Data

By Tracy Teal; originally posted in the Dryad blog: Guided by our commitment to make research data publishing more seamless and also re-usable, we are thrilled to partner with Open Knowledge Foundation and the Frictionless Data team to enhance our submission processes. Integrating the Frictionless Data toolkit, Dryad will be able to directly provide feedback to authors on the structure of tabular files uploaded. This will also allow for automated file level metadata to be created at upload and available for download for published datasets. We are excited to get moving on this project and with support from the Sloan Foundation, Open Knowledge Foundation has just announced a job opening to contribute to this work. Please check out the posting and circulate it to any developers who may be interested in building out this functionality with us:

Announcing the New Frictionless Framework

- October 8, 2020 in Frictionless Data

By Evgeny Karev & Lilly Winfree

Frictionless Framework

We are excited to announce our new high-level Python framework, frictionless-py: Frictionless-py was created to simplify overall user-experience for working with Frictionless Data in Python. It provides several high-level improvements in addition to many low-level fixes. Read more details below, or watch this intro video by Frictionless developer Evgeny:  

Why did we write new Python code?

Frictionless Data has been in development for almost a decade, with global users and projects spanning domains from science to government to finance. However, our main Python libraries (datapackage-py, goodtables-py, tableschema-py, tabulator-py) were originally built with some inconsistencies that have confused users over the years. We had started redoing our documentation for our existing code, and realized we had a larger issue on our hands – mainly that the disparate Python libraries had overlapping functionalities and we were not able to clearly articulate how they all fit together to form a bigger picture. We realized that overall, the existing user experience was not where we wanted it to be. Evgeny, the Frictionless Data technical lead developer, had been thinking about ways to improve the Python code for a while, and the outcome of that work is frictionless-py.

What happens to the old Python code (datapackage-py, goodtables-py, tableschema-py, tabulator-py)? How does this affect current users?

Datapackage-py (see details), tableschema-py (see details), tabulator-py (see details) still exist, will not be altered, and will be maintained. If your project is using this code, these changes are not breaking and there is no action you need to take at this point. However, we will be focusing new development on frictionless-py, and encourage you to consider starting to experiment with or work with frictionless-py during the last months of 2020 and migrate to it starting from 2021 (here is our migration guide). The one important thing to note is that goodtables-py has been subsumed by frictionless-py (since version 3 of Goodtables). We will continue to bug-fix goodtables@2.x in this branch and it is also still available on PyPi as it was before. Please note that frictionless@3.x version’s API is not stable as we are continuing to work on it at the moment. We will release frictionless@4.x by the end of 2020 to be the first SemVer/stable version.

What does frictionless-py do?

Frictionless-py has four main functions for working with data: describe, extract, validate, and transform. These are inspired by typical data analysis and data management methods.  Describe your data: You can infer, edit and save metadata of your data tables. This is a first step for ensuring data quality and usability. Frictionless metadata includes general information about your data like textual description, as well as field types and other tabular data details. Extract your data: You can read your data using a unified tabular interface. Data quality and consistency are guaranteed by a schema. Frictionless supports various file protocols like HTTP, FTP, and S3 and data formats like CSV, XLS, JSON, SQL, and others. Validate your data: You can validate data tables, resources, and datasets. Frictionless generates a unified validation report, as well as supports a lot of options to customize the validation process. Transform your data: You can clean, reshape, and transfer your data tables and datasets. Frictionless provides a pipeline capability and a lower-level interface to work with the data. Additional features: 
  • Powerful Python framework
  • Convenient command-line interface
  • Low memory consumption for data of any size
  • Reasonable performance on big data
  • Support for compressed files
  • Custom checks and formats
  • Fully pluggable architecture
  • The included API server
  • More than 1000+ tests

How can users get started?

We recommend that you begin by reading the Getting Started Guide and the Introduction Guide. We also have in depth documentation for Describing Data, Extracting Data, Validating Data, and Transforming Data.

How can you give us feedback?

What do you think? Let us know your thoughts, suggestions, or issues by joining us in our community chat on Discord or by opening an issue in the frictionless-py repo:


Where’s the documentation?

Are you a new user? Start here: Getting Started & Introduction Guide Are you an existing user? Start here: Migration Guide The full list of documentation can be found here: 

What’s the difference between datapackage and frictionless?

In general, frictionless is our new generation software while tabulator/tableschema/datapackage/goodtables is our previous generation software. Frictionless has a lot of improvements over them. Please see this issue for the full answer and a code example:

I’ve spotted a bug – where do I report it?

Let us know by opening an issue in the frictionless-py repo: For tabulator/tableschema/datapackage issues, please use the corresponding issue tracker and we will triage it for you. Thanks!

I have a question – where do I get help?

You can ask us questions in our Discord chat and someone from the main developer team or from the community will help you. Here is an invitation link: We also have a Twitter account (@frictionlessd8a) and community calls where you can come meet the team and ask questions:

I want to help – how do I contribute?

Amazing, thank you! We always welcome community contributions. Start here ( and here ( and you can also reach out to Evgeny (@roll) or Lilly (@lwinfree) on GitHub if you need help.

Additional Links/Resources

An update from the 2020 Frictionless Tool Fund grantees

- September 30, 2020 in Frictionless Data

We are excited to share project updates from our 2020 Frictionless Data Tool Fund! Our five grantees are about half-way through their projects and have written updates below to share with the community. These grants have been awarded to projects using Frictionless Data to improve reproducible data workflows in various research contexts. Read on to find out what they have been working on and ways that you can contribute!

Carles Pina Estany: Schema Collaboration

The goal of the schema-collaboration tool fund is to create an online platform to enable data managers and researchers to collaborate on describing their data through writing Frictionless data package schemas. The basics can be seen and tested on the online instance of the platform: the data manager can create a package, assign data packages to researchers, add comments and send a link to the researchers which will use datapackage-ui to edit the package and save it, making it available for the data manager. The next steps are to add extra fields to datapackage-ui and to work on the integration between schema-collaboration and datapackage-ui to make maintenance easier. Carles also plans to have an output of the datapackage as a PDF to help data managers and researchers spot errors. Progress can be followed through the project Wiki and feedback would be welcome through Github issues. Read more about Carles’ project here:  

Simon Tyrrell: Frictionless Data for Wheat

As part of the Designing Future Wheat project, Simon and team have repositories containing a wide variety of heterogeneous data. They are trying to standardise how to expose these datasets and their associated metadata. The first of their portals stores its data in an iRODS ( repository. They have recently completed the additions to our web module, eirods-dav, that uses the files, folders and metadata stored within this repository to automatically generate the Data Packages for the datasets. The next step is to look at expanding the data that is added to the Data Packages and similarly automatically expose tabular data as Tabular Data Packages. The eirods-dav GitHub repository is at and any feedback or queries are very welcome. Read more about Simon’s project here:  

Stephen Eglen: Analysis of spontaneous activity patterns in developing neural circuits using Frictionless Data tools

Stephen and Alexander have been busy over the summer integrating the frictionless tools into a workflow for analysis electrophysiological datasets. They have written converters to read in their ASCII- and HDF5-based data and convert them to frictionless containers.  Along the way, they have given helpful feedback to the team about the core packages. They have settled on the python interface as the most feature rich implementation to work with.  Alexander has now completed his analysis of the data, and we are currently working on a manuscript to highlight our research findings. Read more about Stephen’s project here:  

Asura Enkhbayar: Metrics in Context

How much do we know about the measurement tools used to create scholarly metrics? While data models and standards are neither new nor uncommon to the scholarly space, “Metrics in Context” is all about the very apparatuses we use to capture the scholarly activity embedded in those metrics. In order to confidently use citations and altmetrics in research assessment or hiring and promotion decisions, we need to be able to provide standardized descriptions of the involved digital infrastructure and acts of capturing. Asura is currently refining the conceptual model for scholarly events in the digital space in order to be able to account for various types of activities (both traditional and alternative scholarly metrics). After a review of the existing digital landscape of scholarly infrastructure projects, he will dive into the implementation using Frictionless. You can find more details on the open roadmap on Github and feel free to submit questions and comments as issues! Read more about Asura’s project here:  

Nikhil Vats: Adding Data Package Specifications to InterMine’s im-tables

Nikhil is working with InterMine to add data package specifications to im-tables (a library to query biological data) so that users can export metadata along with query results. Right now, the metadata contains field names, their description links, types, paths, class description links and primary key(s). Nikhil is currently figuring out ways to get links for data sources, attribute descriptions and class descriptions from their fair terms (or description links). Next steps for the project include building the frontend for this feature in im-tables and getting the rest of required information like result file format (CSV, TSV, etc.) about data in the datapackage.json (metadata) file. You can contribute to this project by opening an issue here or reaching out at Read more about Nikhil’s project here:

Goodtables: Expediting the data submission and submitter feedback process

- September 16, 2020 in Frictionless Data

by Adam Shepherd, Amber York, Danie Kinkade, and Lilly Winfree This post, originally published on the BCO-DMO blog, describes the second part of our Frictionless Data Pilot collaboration.   Logos for Goodtables and BCO-DMO   Earlier this year, the Biological and Chemical Oceanography Data Management Office (BCO-DMO) completed a pilot project with the Open Knowledge Foundation (OKF) to streamline the data curation processes for oceanographic datasets using Frictionless Data Pipelines (FDP). The goal of this pilot was to construct reproducible workflows that transformed the original data submitted to the office into archive-quality, FAIR-compliant versions. FDP lets a user define an order of processing steps to perform on some data, and the project developed new processing steps specific to the needs of these oceanographic datasets. These ordered processing steps are saved into a configuration file that is then available to be used anytime the archived version of the dataset must be reproduced. The primary value of these configuration files is that they capture and make the curation process at BCO-DMO transparent. Subsequently, we found additional value internally by using FDP in three other areas. First, they made the curation process across our data managers much more consistent versus the ad-hoc data processing scripts they individually produced before FDP. Second, we found that data managers saved time because they could reuse pre-existing pipelines to process newer versions submitted for pre-existing datasets. Finally, the configuration files helped us keep track of what processes were used in case a bug or error was ever found in the processing code. This project exceeded our goal of using FDP on at least 80% of data submissions to BCO-DMO to where we now use it almost 100% of the time. As a major deliverable from BCO-DMO’s recent NSF award the office planned to refactor its entire data infrastructure using techniques that would allow BCO-DMO to respond more rapidly to technological change. Using Frictionless Data as a backbone for data transport is a large piece of that transformation. Continuing to work with OKF, both groups sought to continue our collaboration by focusing on how to improve the data submission process at BCO-DMO.  
Goodtables detects a duplication error

Goodtables noticed a duplicate row in an uploaded tabular data file.

  Part of what makes BCO-DMO a successful data curation office is our hands-on work helping researchers achieve compliance with the NSF’s Sample and Data Policy coming from their Ocean Sciences division. Yet, a steady and constant queue of data submissions means that it can take some weeks before our data managers can thoroughly review data submissions and provide necessary feedback to submitters. In response, BCO-DMO has been creating a lightweight web application for submitting data while ensuring such a tool preserves the easy experience of submitting data that presently exists. Working with OKF, we wanted to expedite the data review process by providing data submitters with as much immediate feedback as possible by using Frictionless Data’s GoodTables project. Through a data submission platform, researchers would be able to upload data to BCO-DMO and, if tabular, get immediate feedback from Goodtables about whether it was correctly formatted or any other quality issues existed. With these reports at their disposal, submitters could update their submissions without having to wait for a BCO-DMO data manager to review. For small and minor changes this saves the submitter the headache of having to wait for simple feedback. The goal is to catch submitters at a time where they are focused on this data submission so that they don’t have to return weeks later and reconstitute their headspace around these data again. We catch them when their head is in the game. Goodtables provides us a framework to branch out beyond simple tabular validation by developing data profiles. These profiles would let a submitter specify the type of data they are submitting. Is the data a bottle or CTD file? Does it contain latitude, longitude time or depth observations? These questions, optional for submitters to answer, would provide even further validation steps to get improved feedback immediately. For example, specifying that a file contains latitude or longitude columns could detect whether all values fall within valid bounds. Or that a depth column contains values above the surface. Or that the column pertaining to the time of an observation has inconsistent formatting across some of the rows. BCO-DMO can expand on this platform to continue to add new and better quality checks that submitters can use.
Goodtables detects incorrect longitudes

Goodtables noticed a longitude that is outside a range of -180 to 180. This happended because BCO-DMO recommends using decimal degrees format between -180 t0 180 and defined a Goodtables check for longitude fields.

Frictionless Data Monthly Virtual Hangout – 27 August

- August 24, 2020 in Frictionless Data

Join the Frictionless Data group for a virtual hangout on 27 August! These monthly hangouts are a casual opportunity to meet other Frictionless Data users and the main contributor team, ask questions, and learn about recent developments. We will spend extra time during this call discussing the newly-released Python code Frictionless-py and would love to hear any feedback! The hangout is scheduled to occur on 27th August 2020 at 5 pm BST / 4 PM UTC. This will be a 1-hour meeting where community members come together to discuss key topics in the data community. If you would like to attend the hangout, you can sign up for the event using this form. We hope to see you there! PS – you can follow the Frictionless Data project on Twitter at and we also have an events calendar at

Clarifying the semantics of data matrices and results tables: a Frictionless Data Pilot

- July 21, 2020 in Frictionless Data, Genomics, pilot

As part of the Frictionless Data for Reproducible Research project, funded by the Sloan Foundation, we have started a Pilot collaboration with the  Data Readiness Group  at the Department of Engineering Science of the University of Oxford; the group will be represented by Dr. Philippe Rocca-Serra, an Associate Member of Faculty. This Pilot will focus on removing the friction in reported scientific experimental results by applying the Data Package specifications. Written with Dr. Philippe Rocca-Serra. Oxford department of engineering science logo Oxford Data Readiness Group Publishing of scientific experimental results is frequently done in ad-hoc ways that are seldom consistent. For example, results are often deposited as idiosyncratic sets of Excel files or tabular files that contain very little structure or description, making them difficult to use, understand and integrate. Interpreting such tables requires human expertise, which is both costly and slow, and leads to low reuse.  Ambiguous tables of results can lead researchers to rerun analysis or computation over the raw data before they understand the published tables. This current approach is broken, does not fit users’ data mining workflows, and limits meta-analysis. A better procedure for organizing and structuring information would reduce unnecessary use of computational resources, which is where the Frictionless Data project comes into play. This Pilot collaboration aims to help researchers publish their results in a more structured, reusable way. In this Pilot, we will use (and possibly extend) Frictionless tabular data packages to devise both generic and specialized templates. These templates can be used to unambiguously report experimental results. Our short term goal from this work is to develop a set of Frictionless Data Packages for targeted use cases where impact is high. We will first focus first on creating templates for statistical comparison results, such as differential analysis, enrichment analysis, high-throughput screens, and univariate comparisons, in genomics research by using the STATO ontology within tabular data packages.  Our longer term goals are that these templates will be incorporated into publishing systems to allow for more clear reporting of results, more knowledge extraction, and more reproducible science.  For instance, we anticipate that this work will allow for increased consistency of table structure in publications, as well as increased data reuse owing to predictable syntax and layout. We also hope this work will ease creation of linked data graphs from table of results due to clarified semantics.  An additional goal is to create code that is compatible with R’s ggplot2 library, which would allow for easy generation of data analysis plots.  To this end, we plan on working with R developers in the future to create a package that will generate Frictionless Data compliant data packages.  This work has recently begun, and will continue throughout the year. We have already met with some challenges, such as working on ways to transform, or normalize, data and ways to incorporate RDF linked data (you can read our related conversations in GitHub). We are also working on how to define a ‘generic’ table layout definition, which is broad enough to be reused in as wide a range of situation as possible. If you are interested in staying up to date on this work, we encourage you to check out these GitHub repositories: and Additionally, we will (virtually) be at the eLife Sprint in September to work on closely related work, which you can read about here:  Throughout this Pilot, we are planning on reaching out to the community to test these ideas and get feedback. Please contact us on GitHub or in Discord if you are interested in contributing.

Reflecting on the first cohort of Frictionless Data Reproducible Research fellows

- June 9, 2020 in Frictionless Data

It is truly bittersweet to say that we are at the end of the first cohort of the Frictionless Data Reproducible Research fellows. Over the past nine months, I have had the pleasure of working with Monica Granados, Selene Yang, Daniel Ouso and Lily Zhao during the fellows programme. Combining their diverse backgrounds (from government data to mapping data, from post-PhD to graduate student), they have spent many hours together learning how to advocate for open science and how to use the Frictionless Data code and tools to make their data more reproducible. Together, they have also written several blogposts, presented a talk and given a workshop. And they did all of this during a global pandemic! I feel lucky to have worked with them, and will be eagerly watching their contributions to the open science space. Each fellow wrote a final blogpost reflecting on their time with the programme. You can read the originals here, and I have also republished them below:

Lily Zhao: Reflecting on my time as a fellow

As one of the inaugural Reproducible Research Fellows of Frictionless Data, I am eager to share my experience of the program with you about working with Sele, Ouso and Monica under the leadership of Lilly Winfree this year. I could not have asked for a better group of individuals to work remotely with. Sele, Ouso, Monica and I spent the last nine months discussing common issues in research reproducibility and delving into the philosophy behind open data science. Together we learned to apply Frictionless Data tools to our own data and mastered techniques for streamlining the reproducibility of our own research process. Lilly was an excellent mentor throughout the program and was always there to help with any issues we ran into. This was also one of my first experiences working entirely remotely on a team across many time zones. Through the use of Google hangout, Zoom and Slack the entire process was easier than I ever thought it could be. It is wonderful that through technology we are able to collaborate across the world easier than ever before. We were also able to give multiple presentations together. Monica and I were joint speakers as part of the csv conference where we talk about our experience as fellows, and our experience using Frictionless Data tools. With so many people on the Zoom call it really felt like were part of a large community. The four of us also led a hands-on workshop introducing the Data Package Creator and GoodTables web interface tools. This was especially fun for me because we used a subset of my French Polynesia interview data as practice data for all workshop participants. Many of the questions asked by participants mirrored questions the four of us had already worked through together, so it was great to be able to share what we had learned with others. I look forward to sharing these tools and the philosophy of open data science throughout my career and am very grateful to the Open Knowledge Foundation for this amazing learning opportunity. If you would like to learn more about my experience in the Frictionless Data Fellows program please feel free to reach out to me personally! Monica, Sele, Lilly, Ouso and I on our most recent conference call :)

Monica Granados: Gimme Dat Data (in a validated Data Package)

As a scientist I collect a lot of data. Especially about animals that live in the water – fish, mussels, crayfish. This data is not only useful to me but it can be used by others to improve the power in their studies, increase geographic range or phylogenetic diversity for example. Prior to the Frictionless Data for Reproducible Research Fellowship, I had my data on GitHub along with a script that would use rcurl to pull the data from the repository. While the repository was accompanied by a README, the file didn’t have much information other than the manuscript which included the data. This structure facilitated reproducibility but not reusability. Conceivably if you wanted to use my data for your own experiments you could have contextualised the data by using the relevant manuscript, but it still would have been a challenge without any metadata, not to mention any potential structural errors you could have encountered that I didn’t catch when I uploaded the data. It was through the introduction of Frictionless Tools, however that I realised that there was more I could do to make my science even more transparent, reproducible and reusable. The fellowship syllabus was structured in such a way that by learning about the tools we learned what the tools were facilitating – better data sharing. The fellows would learn how to use the tool through a self guided lesson and then answer questions on Slack which asked us to interrogate why the tool was built the way it was. These lessons were also supported by calls with the full cohort of fellows where we discussed what we had learned, problems we were encountering as we used the tools with our own data and reviewed papers on open science. The fellowship culminated with a workshop delivered by all four fellows attended by over 40 participants and a presentation at csv,conf. Now when I share data as a data package I know I have validated by tabular data for structural errors and the file contains metadata that contextualises the data. Having the opportunity to be a part of the inaugural cohort has been a wonderful experience. I learned new tools and information that I will take and share for the rest of my career, but also gained new colleagues and open science friends in my fellow fellows.

Daniel Ouso: Better Data, one resource at a time – my fellowship experience

Getting into the Frictionless Data fellowship

My background is largely in molecular biology, particularly infection diagnostics targeting arthropod viruses, bacteria and protozoa. I have a relatively shorter bioinformatics experience, but this is the direction am passionate to build my research occupation in. I first heard about Frictionless data from the African Carpentries instructors’ mailing list. It was the inaugural fellowship call that had been shared by Anelda. I caught it at the nick of time; deadline submission! By the way, you can watch for annual calls and other interesting stuff by following @frictionlessd8a. The call for the second cohort just closed in June and was open from late April. The fellowship starts in September.


Lilly arranged the first-time meeting to usher me into the fellow, after a few email correspondence. I got introduced Jo Barrat who patiently took me through my paces completing logistical preliminaries. I was really looking forward to getting started. The on-boarding enabled acquaintance with the rest of the fellows, awesome people. I was excited!


Overall, the world is in search of and is promoting better ways to work with data, whether it is collecting data or accessibility or novel ways to analyse high-throughput data or dedicated workflows to publish data alongside accustomed scientific publishing or moving/working with data across frameworks or merely storage and security. All these, plus other factors, provide avenues to exhaustively interrogate data in multiple ways, thus promoting improved data usefulness. This has been arguably under-appreciated in times past. Frictionless data, through its Progressive Data Toolkit and with the help of organisations like OKF and funding by Sloan Foundation, is dedicated to alleviating hindrances to some of the aforementioned efforts. People empowerment is a core resource to the #BetterData dream.

The fellowship

An aspect of any research is the collection of data, which is applied to test hypotheses under study. The underlying importance of data, good data for that matter, in research is therefore unquestionable. Approaches to data analysis may differ from field to field, yet there are conventional principles that do not discriminate fields; such are the targets to Frictionless Data. I jump at the opportunity to learn ways to ramp up my data workflow efficiency, with a touch of research openness and reproducibility. The journey took off withdrawing a meticulous roadmap, which I found very helpful, and seem to end with this – sharing my experience. In between exciting things happened. In case one was coming in a little rusty with their basic Python/R, they were catered for early on, though you didn’t exactly need them to use the tools. To say, literally, ZERO programming skills prerequisite. There were a plethora of resources, and help from the fellows, not to mention from the ever welcoming Lilly. The core sections of the Fellowship were prefaced by grasping basic components like the JSON schema data interchange format. Following were the core tools and their specifications. The Data Package Creator tool is impressively emphatic on capturing metadata, a backbone theme for reproducibility. I found Table Schema and Schema specifications initially confusing. Other fellows and I have previously shared on the Data Package Creator and GoodTables, tools for creating and validating data packages respectively. These tools are very progressive, continually incorporating feedback from the community, including fellows, to improve user experience. So don’t be surprised at a few changes since the fellows’ blogs. In fact, a new entrant, which I only knew of recently, is the DataHub tool – “Is a useful solution for sharing datasets, and discovering high-quality datasets that others have produced”. I am yet to check it out. Besides the main focus of the fellowship, I got to learn a lot covering organisational skills and tools such as GitHub projects, Toggl for time-monitoring, general remote working, among others. I got introduced to new communities/initiatives such as PREreview; my first time to participate in open research reviewing. The fellows were awesome to work with and Lilly Winfree provided the best mentorship. Sometimes problems are foreseen and contingencies planned, other times unforeseen surprises rear their heads into our otherwise “perfect” plan. Guess what? You nailed it! COVID-19. Such require adaptability akin to that of the fictional El Professor in the Money Heist. Since we could not organise the in-person seminar and/or workshops as part of the fellowship, we collectively adopted a virtual workshop. It went amazingly well.

What next

Acquired knowledge and skills become more useful when implemented. My goal is to apply them in every opportune opening and to keep learning other integrative tools. Yet, there is also this about knowledge; it is to be spread. I hope to compensate for suspended social sessions and to keep engagement with @frictionlessd8a to continue open and reproducible research advocacy.


Tools that need minimal to no coding experience support well the adoption of good data hygiene practices, more so in places with scanty coding expertise. The FD tools will surely help your workflows with some greasing regardless of your coding proficiency, especially for tabular data. This is especially needful seeing the deluge of data persistently churned out from various sources. Frictionless Data is for everyone working with data; researcher, data scientist or data engineers. The ultimate goal is to work with data in an open and reproducible way, which is consistent with modern scientific research practice. A concerted approach is also key, I am glad to have represented Africa in the fellowship. Do not hesitate to reach out if you think I can be resourceful to your cause.

Sele Yang: A seguir reproduciendo conocimiento!

Termina un gran proceso para la primera cohorte del Frictionless Data for Reproducible Research Fellowship. Un proceso de grandísimos y valiosos aprendizajes que, sí y sólo sí pudieron darse, gracias al trabajo colaborativo entre todas las personas que participaron. En un inicio, recuerdo el gran miedo (que de alguna forma todavía persiste, pero más levemente) de no contar con las habilidades técnicas requeridas para poder llevar a cabo mi proyecto, pero poco a poco fui conociendo y sintiéndome apoyada por mis compañeros y compañeras, que con muchísima paciencia me llevaron de la mano para no perderme en el proceso. Recorrí gracias a este equipo, las playas de M’orea con los datos de Lily, aprendí de formas de investigación por fuera de mi campo de experiencia con Ouso y Mónica. Reconocí el gran trabajo que realizan investigadores e investigadoras para defender el conocimiento más abierto, equitativo y accesible. Si bien nuestro recorrido compartido termina acá, puedo resaltar que a pesar de la crisis que nos llevó a cambiar muchas acciones con COVID-19 durante nuestro programa, logramos encontrarnos aunque fuese virtualmente, no sólo para compartir entre nosotres, sino también con una gran audiencia para nuestro taller sobre el uso de herramientas y metodologías del programa. Una gran actividad para reforzar la importancia de compartir conocimiento, y hacerlo más accesible, mucho más en tiempos de crisis. Agradezco al Open Knowledge Foundation por haber llevado a cabo este programa, y les invito a todas las personas a que recorran la información que produjimos durante estos meses de trabajo. Termino este proceso de aprendizaje con la convicción todavía más fuerte sobre lo necesario qué son los procesos colaborativos que buscan aperturar y democratizar la ciencia y el conocimiento. Mucho más en estos tiempos en los que la colaboración y puesta en común del aprendizaje nos hará más fuertes como sociedad.

Join the Frictionless Data workshop – 20 May

- April 28, 2020 in Frictionless Data

  Join us on 20 May at 4pm UK/10am CDT for a Frictionless Data workshop led by the Reproducible Research Fellows! This 1.5 hour long workshop will cover an introduction to the open source Frictionless Data tools. Participants will learn about data wrangling, including how to document metadata, package data into a datapackage, write a schema to describe data, and validate data. The workshop is suitable for beginners and those looking to learn more about using Frictionless Data.

Everyone is welcome to join, but you must register to attend using this link

The Fellows Programme is part of the Frictionless Data for Reproducible Research project overseen by the Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. At its core, Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The core specification, the Data Package, is a simple and practical “container” for data and metadata. This workshop will be led by the members of the First Cohort of the Fellows Programme: Lily Zhao, Daniel Ouso, Monica Granados, and Selene Yang. You can read more about their work during this programme here: Additionally, applications are now open for the Second Cohort of Fellows. Read more about applying here: