You are browsing the archive for Frictionless Data.

Frictionless Data for Reproducible Research Call for Pilot Collaborations with Scientists

- January 20, 2020 in Frictionless Data

Have you ever looked back at a graph of fluorescence change in neurons or gene expression data in C. elegans from years ago and wondered how exactly you got that result? Would you have enough findable notes at hand to repeat that experiment? Do you have a quick, repeatable method for preparing your data to be published with your manuscripts (as required by many journals and funders)? If these questions give you pause, we are interested in helping you!   For many data users, getting insight from data is not always a straightforward process. Data is often hard to find, archived in difficult to use formats, poorly structured or incomplete. These issues create friction and make it difficult to use, publish, and share data. The Frictionless Data initiative aims to reduce friction in working with data, with a goal to make it effortless to transport data among different tools and platforms for further analysis.  
The Frictionless Data for Reproducible Research project, part of the Open Knowledge Foundation and funded by the Sloan Foundation, is focused on helping researchers and the research community resolve data workflow issues.

Over the last several  years, Frictionless Data has produced specifications, software, and best practices that address identified needs for improving data-driven research such as generalized, standard metadata formats, interoperable data, and open-source tooling for data validation.    For researchers, Frictionless Data tools, specifications, and software can be used to:
    • Improve the quality of your dataset
    • Quickly find and fix errors in your data
    • Put your data collection and relevant information that provides context about your data in one container before you share it
    • Write a schema – a blueprint that tells others how your data is structured, and what type of content is to be expected in it
    • Facilitate data reuse by creating machine-readable metadata
    • Make your data more interoperable so you can import it into various tools like Excel, R, or Python
    • Publish your data to repositories more easily
    • See our open source repositories here
    • Read more about how to get started with our Field Guide tutorials
  Importantly, these tools can be used on their own, or adapted into your own personal and organisational workflows. For instance, neuroscientists can implement Frictionless Data tooling and specs can help keep track of imaging metadata from the microscope to analysis software to publication; optimizing ephys data workflow from voltage recording, to tabular data, to analyzed graph; or to make data more easily shareable for smoother publishing with a research article.
We want to learn about your multifacet workflow and help make your data more interoperable between the various formats and tools you use.
We are looking for researchers and research-related groups to join Pilots, and are particularly keen to work with: scientists creating data, data managers in a research group, statisticians and data scientists, data wranglers in a database, publishers, and librarians helping researchers manage their data or teaching data best practices. The primary goal of this work will be to work collaboratively with scientists and scientific data to enact exemplar data practice, supported by Frictionless Data specifications and software, to deliver on the promise of data-driven, reproducible research. We will work with you, integrating with your current tools and methodologies, to enhance your workflows and provide increased efficiency and accuracy of your data-driven research.     Want to know more? Through our past Pilots, we worked directly with organisations to solve real problems managing data:
  • In an ongoing Pilot with the Biological and Chemical Oceanography Data Management Office (BCO-DMO), we helped BCO-DMO develop a data management UI, called Laminar, which incorporates Frictionless Data Package Pipelines on the backend. BCO-DMO’s data managers are now able to receive data in various formats, import the data into Laminar, and perform several pipeline processes, and then host the clean, transformed data for other scientists to (re)use.  The next steps in the Pilot are to incorporate GoodTables into the Laminar pipeline to validate the data as it is processed. This will help ensure data quality and will also improve the processing experience for the data managers.
  • In a Pilot with the University of Cambridge, we worked with Stephen Eglen to capture complete metadata about retinal ganglion cells in a data package. This metadata included the type of ganglion cell, the species, the radius of the soma, citations, and raw images. 
  • Collaborating with the Cell Migration Standard Organization (CMSO), we investigated the standardization of cell tracking data. CMSO used the Tabular Data Package to make it easy to import their data into a Pandas dataframe (in Python) to allow for dynamic data visualization and analysis.
To find out more about Frictionless data visit frictionlessdata.io or email the team frictionlessdata@okfn.org.

Frictionless Data Tool Fund update: Shelby Switzer and Greg Bloom, Open Referral

- January 15, 2020 in Data Package, Frictionless Data, Open Knowledge

This blogpost is part of a series showcasing projects developed during the 2019 Frictionless Data Tool Fund. The 2019 Frictionless Data Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.     Open Referral Logo   Open Referral creates standards for health, human, and social services data – the data found in community resource directories used to help find resources for people in need. In many organisations, this data lives in a multitude of formats, from handwritten notes to Excel files on a laptop to Microsoft SQL databases in the cloud. For community resource directories to be maximally useful to the public, this disparate data must be converted into an interoperable format. Many organisations have decided to use Open Referral’s Human Services Data Specification (HSDS) as that format. However, to accurately represent this data, HSDS uses multiple linked tables, which can be challenging to work with. To make this process easier, Greg Bloom and Shelby Switzer from Open Referral decided to implement datapackage bundling of their CSV files using the Frictionless Data Tool Fund.  In order to accurately represent the relationships between organisations, the services they provide, and the locations they are offered, Open Referral aims to use their Human Service Data Specification (HSDS) makes sense of disparate data by linking multiple CSV files together by foreign keys. Open Referral used Frictionless Data’s datapackage to specify the tables’ contents and relationships in a single machine-readable file, so that this standardised format could transport HSDS-compliant data in a way that all of the teams who work with this data can use: CSVs of linked data.  In the Tool Fund, Open Referral worked on their HSDS Transformer tool, which enables a group or person to transform data into an HSDS-compliant data package, so that it can then be combined with other data or used in any number of applications. The HSDS-Transformer is a Ruby library that can be used during the extract, transform, load (ETL) workflow of raw community resource data. This library extracts the community resource data, transforms that data into HSDS-compliant CSVs, and generates a datapackage.json that describes the data output. The Transformer can also output the datapackage as a zip file, called HSDS Zip, enabling systems to send and receive a single compressed file rather than multiple files. The Transformer can be spun up in a docker container — and once it’s live, the API can deliver a payload that includes links to the source data and to the configuration file that maps the source data to HSDS fields. The Transformer then grabs the source data and uses the configuration file to transform the data and return a zip file of the HSDS-compliant datapackage.  HSDS demo app

Example of a demo app consuming the API generated from the HSDS Zip

The Open Referral team has also been working on projects related to the HSDS Transformer and HSDS Zip. For example, the HSDS Validator checks that a given datapackage of community service data is HSDS-compliant. Additionally, they have used these tools in the field with a project in Miami. For this project, the HSDS Transformer was used to transform data from a Microsoft SQL Server into an HSDS Zip. Then that zipped datapackage was used to populate a Human Services Data API with a generated developer portal and OpenAPI Specification.   Further, as part of this work, the team also contributed to the original source code for the datapackage-rb Ruby gem. They added a new feature to infer a datapackage.json schema from a given set of CSVs, so that you can generate the json file automatically from your dataset. Greg and Shelby are eager for the Open Referral community to use these new tools and provide feedback. To use these tools currently, users should either be a Ruby developer who can use the gem as part of another Ruby project, or be familiar enough with Docker and HTTP APIs to start a Docker container and make an HTTP request to it. You can use the HSDS Transformer as a Ruby gem in another project or as a standalone API. In the future, the project might expand to include hosting the HSDS Transformer as a cloud service that anyone can use to transform their data, eliminating many of these technical requirements. Interested in using these new tools? Open Referral wants to hear your feedback. For example, would it be useful to develop an extract-transform-load API, hosted in the cloud, that enables recurring transformation of nonstandardised human service directory data source into an HSDS-compliant datapackage? You can reach them via their GitHub repos. Further reading: openreferral.org Repository: https://github.com/openreferral/hsds-transformer HSDS Transformer: https://openreferral.github.io/hsds-transformer/ 

Neuroscience Experiments System Frictionless Tool

- December 16, 2019 in Frictionless Data, Open Knowledge

This blog is part of a series showcasing projects developed during the 2019 Frictionless Data Tool Fund.  The 2019 Frictionless Data Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.  

NES logo

Neuroscience Experiments System Frictionless Data Incorporation, by the Technology Transfer team of the Research, Innovation and Dissemination Center for Neuromathematics.

  The Research, Innovation and Dissemination Center for Neuromathematics (RIDC NeuroMat) is a research center established in 2013 by the São Paulo Research Foundation (FAPESP) at the University of São Paulo, in Brazil. A core mission of NeuroMat is the development of open-source computational tools to aid in scientific dissemination and advance open knowledge and open science. To this end, the team has created the Neuroscience Experiments System (NES), which is an open-source tool to assist neuroscience research laboratories in routine procedures for data collection. To more effectively understand the function and treatment of brain pathologies, NES aids in recording data and metadata from various experiments, including clinical data, electrophysiological data, and fundamental provenance information. NES then stores that data in a structured way, allowing researchers to seek and share data and metadata from those neuroscience experiments.  For the 2019 Tool Fund, the NES team, particularly João Alexandre Peschanski, Cassiano dos Santos and Carlos Eduardo Ribas, proposed to adapt their existing export component to conform to the Frictionless Data specifications.   Public databases are seen as crucial by many members of the neuroscience community as a means of moving science forward. However, simply opening up data is not enough; it should be created in a way that can be easily shared and used. For example, data and metadata should be readable by both researchers and machines, yet they typically are not. When the NES team learned about Frictionless Data, they were interested in trying to implement the specifications to help make the data and metadata in NES machine readable.  For them, the advantage of the Frictionless Data approach was to be able to standardize data opening and sharing within the neuroscience community.   Before the Tool Fund, NES had an export component that set up a file with folders and documents with information on an entire experiment (including data collected from participants, device metadata, questionnaires, etc. ), but they wanted to improve this export to be more structured and open. By implementing Frictionless Data specifications, the resulting export component includes the Data Package (datapackage.json) and the folders/files inside the archive, with a root folder called data. With this new “frictionless” export component, researchers can transport and share their export data with other researchers in a recognized open standard format (the Data Package), facilitating the understanding of that exported data. They have also implemented Goodtables into the unit tests to check data structure.   The RIDC NeuroMat team’s expectation is that many researchers,  particularly neuroscientists and experimentalists, will have an interest in using the freely available NES tool. With the anonymization of sensitive information, the data collected using NES can be publicly available through the NeuroMat Open Database, allowing any researcher to reproduce the experiment or simply use the data in a different study. In addition to storing collected experimental data and being a tool for guiding and documenting all the steps involved in a neuroscience experiment, NES has an integration with the Neuroscience Experiment Database, another NeuroMat project, based on a REST API, where NES users can send their experiments to become publicly available for other researchers to reproduce them or to use as inspiration for further experiments. Screenshot of the export of an experiment: NES export   Screenshot of the export of data on participants:   Picture of a hypothetical export file tree of type Per Experiment after the Frictionless Data implementation: NES data   Further reading: Repository: https://github.com/neuromat/nes User manual: https://nes.readthedocs.io/en/latest/ NeuroMat blog: https://neuromat.numec.prp.usp.br/ Post on NES at the NeuroMat blog: https://neuromat.numec.prp.usp.br/content/a-pathway-to-reproducible-science-the-neuroscience-experiments-system/

Announcing Frictionless Data Joint Stewardship

- December 12, 2019 in Frictionless Data, Open Knowledge

We are pleased to announce joint stewardship of Frictionless Data between the Open Knowledge Foundation and Datopian. While this collaboration already exists informally, we are solidifying how we are leading together on future Frictionless Data projects and goals.   What does this mean for users of Frictionless Data software and specifications?   First, you will continue to see a consistent level of activity and support from Open Knowledge Foundation, with a particular focus on the application of Frictionless Data for reproducible research, as part of our three-year project funded by the Sloan Foundation. This also includes specific contributions in the development of the Frictionless Data specifications under the leadership of Rufus Pollock, Datopian President and Frictionless Data creator, and Paul Walsh, Datopian CEO and long-time contributor to the specifications and software.   Second, there will be increased activity in software development around the specifications, with a larger team across both organisations contributing to key codebases such as Good Tables, and the various integrations with backend storage systems such as Elasticsearch, BigQuery, and PostgreSQL, and data science tooling such as PandasAdditionally, based on their CKAN commercial services work, and co-stewardship of the CKAN project, Datopian look forward to providing more integrations of Frictionless Data with CKAN, building on existing work done at the Open Knowledge Foundation.    Our first joint project is redesigning the Frictionless Data website. Our goal is to make the project more understandable, usable, and user-focused. At this point, we are actively seeking user input, and are requesting interviews to help inform the new design. Have you used our website and are interested in having your opinion heard? Please get in touch to give us your ideas and feedback on the site. Focusing on user needs is a top goal for this project.   Ultimately, we are focused on leading the project openly and transparently, and are excited by the opportunities that clarification of the leadership of the project will provide. We want to emphasize that the Frictionless Data project is community focused, meaning that we really value to input and participation of our community of users. We encourage you to reach out to us on Discuss, in Gitter, or open issues in GitHub with your ideas or problems. Datopian   OKF logo   

Frictionless DarwinCore Tool by André Heughebaert

- December 9, 2019 in Frictionless Data, Open Knowledge, Open Research, Open Science, Open Software, Technical

This blog is part of a series showcasing projects developed during the 2019 Frictionless Data Tool Fund.  The 2019 Frictionless Data Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.   logo

Frictionless DarwinCore, developed by André Heughebaert

  André Heughebaert is an open biodiversity data advocate in his work and his free time. He is an IT Software Engineer at the Belgian Biodiversity Platform and is also the Belgian GBIF (Global Biodiversity Information Facility) Node manager. During this time, he has worked with the Darwin Core Standards and Open Biodiversity data on a daily basis. This work inspired him to apply for the Tool Fund, where he has developed a tool to convert DarwinCore Archives into Frictionless Data Packages.   The DarwinCore Archive (DwCA) is a standardised container for biodiversity data and metadata largely used amongst the GBIF community, which consists of more than 1,500 institutions around the world. The DwCA is used to publish biodiversity data about observations, collections specimens, species checklists and sampling events. However, this domain specific standard has some limitations, mainly the star schema (core table + extensions), rules that are sometimes too permissive, and a lack of controlled vocabularies for certain terms. These limitations encouraged André to investigate emerging open data standards. In 2016, he discovered Frictionless Data and published his first data package on historical data from 1815 Napoleonic Campaign of Belgium. He was then encouraged to create a tool that would, in part, build a bridge between these two open data ecosystems.   As a result, the Frictionless DarwinCore tool converts DwCA into Frictionless Data Packages, and also gives access to the vast Frictionless Data software ecosystem enabling constraints validation and support of a fully relational data schema.  Technically speaking, the tool is implemented as a Python library, and is exposed as a Command Line Interface. The tool automatically converts: project architecture   * DwCA data schema into datapackage.json * EML metadata into human readable markdown readme file * data files are converted when necessary, this is when default values are described The resulting zip file complies to both DarwinCore and Frictionless specifications.    André hopes that bridging the two standards will give an excellent opportunity for the GBIF community to provide open biodiversity data to a wider audience. He says this is also a good opportunity to discover the Frictionless Data specifications and assess their applicability to the biodiversity domain. In fact, on 9th October 2019, André presented the tool at a GBIF Global Nodes meeting. It was perceived by the nodes managers community as an exploratory and pioneering work. While the command line interface offers a simple user interface for non-programmers, others might prefer the more flexible and sophisticated Python API. André encourages anyone working with DarwinCore data, including all data publishers and data users of GBIF network, to try out the new tool. 
“I’m quite optimistic that the project will feed the necessary reflection on the evolution of our biodiversity standards and data flows.”

To get started, installation of the tool is done through a single pip install command (full directions can be found in the project README). Central to the tool is a table of DarwinCore terms linking a Data Package type, format and constraints for every DwC term. The tool can be used as CLI directly from your terminal window or as Python Library for developers. The tool can work with either locally stored or online DwCA. Once converted to Tabular DataPackage, the DwC data can then be ingested and further processed by software such as Goodtables, OpenRefine or any other Frictionless Data software. André has aspirations to take the Frictionless DarwinCore tool further by encapsulating the tool in a web-service that will directly deliver Goodtables reports from a DwCA, which will make it even more user friendly. Additional ideas for further improvement would be including an import pathway for DarwinCore data into Open Refine, which is a popular tool in the GBIF community. André’s long term hope is that the Data Package will become an optional format for data download on GBIF.org.  workflow Further reading: Repository: https://github.com/frictionlessdata/FrictionlessDarwinCore Project blog: https://andrejjh.github.io/fdwc.github.io/

Meet Lily Zhao, one of our Frictionless Data for Reproducible Research Fellows

- November 18, 2019 in Frictionless Data

The Frictionless Data for Reproducible Research Fellows Programme is training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows will learn about Frictionless Data, including how to use Frictionless Data tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. I am thrilled to be joining the Open Knowledge Foundation community as a Frictionless Data fellow. I am an interdisciplinary marine scientist getting my PhD at in the Ocean Recoveries Lab at the University of California Santa Barbara. I study how coral reefs, small-scale fisheries, and coastal communities are affected by environmental change and shifting market availability. In particular, I’m interested in how responsible, solutions-oriented science can help build resilience in these systems and improve coastal livelihoods. My current fieldwork is based in Mo’orea, French Polynesia. With an intricate tapestry of social dynamics and strong linkages between it’s terrestrial and marine environments, the island of Mo’orea is representative of the complexity of coral reef social-ecological systems globally. The reefs around Mo’orea are also some of the most highly studied in the world by scientists. In partnership with the University of French Polynesia and the Atiti’a Center, I recently interviewed local associations, community residents and the scientific community to determine how science conducted in Mo’orea can better serve residents of Mo’orea. One of our main findings is the need for increased access to the scientific process and open communication of scientific findings— both of which are tenets of an open science philosophy. I was introduced to open data science just a year ago as part of the Openscapes program– a Mozilla Firefox and National Center for Ecological Analysis and Synthesis initiative. Openscapes connected me to the world of open software and made me acutely aware of the pitfalls of doing things the way I had always done them. This experience made me excited to learn new skills and join the global effort towards reproducible research. With these goals in mind, I was eager to apply for the Frictionless Data Fellowship where I could learn and share new tools for data reproducibility. So far as a Frictionless Data Fellow, I have particularly enjoyed our conversations about “open” for whom? That is: who is open data science open for? And how can we push to increase inclusivity and access within this space?

A little bit about open data in the context of coral reef science

Coral reefs provide food, income, and coastal protection to over 500 million people worldwide. Yet globally, coral reefs are experiencing major disturbances, with many already past their ecological tipping points. Total coral cover (the abundance of coral seen on a reef) is the simplest and most highly used metric of coral resistance and recovery to climate change and local environmental stressors. However, to the detriment of coral reef research, there is not an open global database of coral cover data for researchers to build off of. The effort and money taken to conduct underwater surveys make coral cover data highly coveted and thus these data are often not available publicly. In the future, I hope to collaborate with researchers around the world to build an open, global database of coral cover data. Open datasets and tools, when used by other researchers, show promise in their ability to efficiently propel research forward. In other fields, open science has accelerated the rate of problem-solving and new discoveries. In the face of climate change, the ability to not reinvent the wheel with each new analysis can allow us to conduct reef resilience research at the speed with which coral reef degradation necessitates. Ultimately, I deeply believe that maintaining coral-dominated ecosystems will require: 1) amplification of the perspectives of coastal communities; and 2) open collaboration and data accessibility among scientists worldwide.

Frictionless Data for Reproducible Research Fellows Programme

More on Frictionless Data

The Fellows programme is part of the Frictionless Data for Reproducible Research project at Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. Frictionless Data’s other current projects include the Tool Fund, in which four grantees are developing open source tooling for reproducible research. The Fellows programme will be running until June 2020, and we will post updates to the programme as they progress. • Originally published at http://fellows.frictionlessdata.io/blog/hello-lily/

Meet Daniel Ouso, one of our Frictionless Data for Reproducible Research Fellows

- November 4, 2019 in Frictionless Data

The Frictionless Data for Reproducible Research Fellows Programme is training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows will learn about Frictionless Data, including how to use Frictionless Data tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. You can call me Daniel Ouso. My roots trace to the lake basin county of Homabay in the Equatorial country in the east of Africa; Kenya. Currently, I live in its capital Nairobi – once known as “The Green City in the Sun”, although thanks to the poor stewardship to Mother Nature this is now debatable. The name is Maasai for a place of cool waters. But enough of beautiful Kenya. I work in the International Centre of Insect Physiology and Ecology as a Bioinformatics expert within the Bioinformatics Unit involved in bioinformatics training and genomic data management. I am a master of science in Molecular biology and Bioinformatics (2019) from Jomo Kenyatta University of Agriculture and Technology, Kenya. My previous work is in infectious disease management and a bit of conservation. My long-term interest is in disease genomics research. I am passionate about research openness and reproducibility, which I gladly noticed as a common interest in the Frictionless Data Fellowship (FDF). I have had previous experience working on a Mozilla Open Science project that really piqued my interest in wanting to learn skills and to expand my knowledge and perspective in the area. To that destination, this fellowship advertised itself as the best vehicle, and it was a frictionless decision to board. My goal is to become a better champion for open-reproducible research by learning data and metadata specifications for interoperability, the associated programmes/libraries/packages and data management best practices. Moreover, I hope to discover additional resources, to network and exchange with peers, and ultimately share the knowledge and skills acquired. Knowledge is cumulative and progressive, an infinite cycle, akin to a corn plant, which grows into a seed from a seed, in between helped by the effort of the farmer and other factors. Whether or not the subsequent seed will be replanted depends, among other competitions, on its quality. You may wonder where I am going with this, so here is the point: for knowledge to bear it must be shared promiscuously; to be verified and to be built upon. The rate of research output is very fast, and so is the need for advancement of the research findings. However, the conclusions may at times be wrong. To improve knowledge, the goal of research is to deepen understanding and confirm findings and claims through reproduction. However, this is dependent on the contribution of many people from diverse places, as such, there is an obvious need to remove or minimise obstacles to the quest for research excellence. As a researcher, I believe that to keep with the rate of research production, findings and data from it must be made available in a form that doesn’t antagonise its re-use or/and validation for further research. It means reducing friction on the research wheel by making research easier, cheaper and quicker to conduct, which will increase collaboration and prevent the reinvention of the wheel. To realise this, it is incumbent on me (and others) to make my contribution both as a producer and an affected party, especially seeing that exponentially huge amounts of biological data continue to be produced. Simply, improving research reproducibility is the right science of this age. I am a member of The Carpentries community as an instructor and currently also in the task force planning the CarpentryCon2020, and hope to meet some of OKF community members there. I am excited to join this community as a Frictionless Data Fellowship! You can find important links and follow my fellowship here.

Frictionless Data for Reproducible Research Fellows Programme

More on Frictionless Data

The Fellows programme is part of the Frictionless Data for Reproducible Research project at Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. Frictionless Data’s other current projects include the Tool Fund, in which four grantees are developing open source tooling for reproducible research. The Fellows programme will be running until June 2020, and we will post updates to the programme as they progress. • Originally published at http://fellows.frictionlessdata.io/blog/hello-ouso/

Meet Sele Yang, one of our Frictionless Data for Reproducible Research Fellows

- October 29, 2019 in Frictionless Data

The Frictionless Data for Reproducible Research Fellows Programme is training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows will learn about Frictionless Data, including how to use Frictionless Data tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. I’m Selene Yang, an unapologetic feminist, a map lover and a social communications researcher. I was born in Costa Rica, but my family is from Nicaragua and I’m half Taiwanese. I also live and work in Paraguay and did my PhD in Argentina. A little confusing, right? I’m also a Human Rights advocate working at a digital rights defender organization called TEDIC, and I’m part of the Research Center for Communications and Public Policies of the National University of La Plata. I also have a cat. At this moment I’m currently working on my dissertation project that grapples with the process of gathering, editing and curating open geospatial data through Volunteer Geographic Information (VGI) in OpenStreetmap (OSM); my research also looks at the interrelationship between mapped objects and the gender of the mappers, to understand its consequence on the perception, use and appropriation of space. My main research question is: What relevance does the representation of space have for women in relation to the practices of data creation, collection, curatorship and visibility? This project has it’s practice based with Geochicas, a group of more than 200 women mappers from 22 different countries in 5 continents who are determined to close the gender gap in the geo-communities through safe learning spaces, community based analysis and data visualisation projects. How did I get here and why is being a frictionless data fellow so important to me? Throughout my career, I have been concerned with how to generate fruitful collaborations between social and data sciences. I believe such collaborations can produce a more equal and broad access to knowledge within the global south. Even though the use of data for the social sciences can often be seen as quantitative and objective instruments, I consider it necessary to find new and creative ways in which findings can be shared, analysed and reproduced in ways in which knowledge can operate fluidly, and as a result of this we can bridge the gaps in the inequality of knowledge production. As a result of this fellowship, I want to understand if there’s an unseen, un-analysed relationship between the mappers’ gender and the objects mapped by them, and how we can better manage potentially gender biased data structures. I consider it of great importance to generate practices, methodologies and concepts that can be useful to create and strengthen an academic community where the culture of openness, diversity and inclusion are the founding bases of knowledge production.

En español

Soy Selene Yang, feminista, amante de los mapas e investigadora en comunicación social. Nací en Costa Rica, pero mi familia es de Nicaragua y soy mitad taiwanesa. Hoy en día vivo y trabajo en Paraguay, y realicé mi doctorado en Comunicación en Argentina. ¿Medio confuso, no? También soy defensora de los derechos humanos y trabajo en TEDIC, una organización enfocada en defender los derechos digitales. También soy parte del Centro de Investigación en Comunicación y Políticas Públicas de la Universidad Nacional de La Plata. También tengo un gato. Actualmente me encuentro trabajando en terminar mi proyecto de disertación doctoral, el cual se enfoca en el análisis sobre la recolección, edición y curaduría de datos geoespaciales abiertos desde la Información Geográfica Voluntaria (VIG por sus siglas en inglés) en OpenStreetmap. También, desde mi investigación, busco conectar las relaciones que existen entre los objetos mapeados y el género de las personas que los mapean, para consecuentemente entender la percepción, el uso y la apropiación de los espacios públicos para las mujeres. La pregunta principal de mi investigación es: ¿Qué relevancia tiene la representación del espacio para las mujeres en relación a las prácticas de creación, recolección, curaduría y visualización de datos? Esta pregunta se inscribe en las prácticas de la colectiva Geochicas, un grupo de más de 200 mujeres mapeadoras, de 22 países distintos, en 5 continentes, quienes están comprometidas a cerrar la brecha de género en las geo-comunidades a través de la creación de espacios de aprendizaje seguros, análisis comunitarios y proyectos de visualización de datos. ¿Cómo llegué acá, y por qué este programa es tan importante para mi? Durante mi carrera, he buscado formas en las cuales se puedan generar vínculos y colaboraciones entre las ciencias sociales y las ciencias de datos. Considero que este tipo de alianzas interdisciplinarias son fundamentales para generar un conocimiento más accesible y equitativo dentro del sur global. A pesar de que las ciencias sociales consideren los datos como instrumentos objetivos y meramente cuantificables, creo que es necesario encontrar creatividad en las nuevas formas en las que los hallazgos de las investigaciones puedan ser compartidas, analizadas y reproducidas para que puedan operar fluidamente, y de esta forma poder cerrar las brechas que existen en la producción de conocimiento. Como resultado de este programa, quisiera entender si es que existe una relación no vista y tampoco analizada entre el gńero de las personas que mapean y los objetos geográficos en relación a las experiencias espaciales, para encontrar mejores formas de manejar las potenciales estructuras de datos sesgadas. Considero de gran importancia generar prácticas, metodologías y conceptos que puedan aportar a la creación y fortalecimiento de una comunidad académica donde la cultura de la apertura, diversidad e inclusividad sean las bases de la producción de saberes.

Frictionless Data for Reproducible Research Fellows Programme

More on Frictionless Data

The Fellows programme is part of the Frictionless Data for Reproducible Research project at Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. Frictionless Data’s other current projects include the Tool Fund, in which four grantees are developing open source tooling for reproducible research. The Fellows programme will be running until June 2020, and we will post updates to the programme as they progress. • Originally published at http://fellows.frictionlessdata.io/blog/hello-sele/

csv,conf returns for version 5 in May

- October 15, 2019 in #CSVconf, Events, Frictionless Data, News, Open Data, Open Government Data, Open Research, Open Science, Open Software

Save the data for csv,conf,v5! The fifth version of csv,conf will be held at the University of California, Washington Center in Washington DC, USA, on May 13 and 14, 2020.    If you are passionate about data and its application to society, this is the conference for you. Submissions for session proposals for 25-minute talk slots are open until February 7, 2020, and we encourage talks about how you are using data in an interesting way (like to uncover a crossword puzzle scandal). We will be opening ticket sales soon, and you can stay updated by following our Twitter account @CSVconference.   csv,conf is a community conference that is about more than just comma-sepatated-values – it brings together a diverse group to discuss data topics including data sharing, data ethics, and data analysis from the worlds of science, journalism, government, and open source. Over two days, attendees will have the opportunity to hear about ongoing work, share skills, exchange ideas (and stickers!) and kickstart collaborations.   
csv,conf,v4

Attendees of csv,conf,v4

First launched in July 2014,  csv,conf has expanded to bring together over 700 participants from 30 countries with backgrounds from varied disciplines. If you’ve missed the earlier years’ conferences, you can watch previous talks on topics like data ethics, open source technology, data journalism, open internet, and open science on our YouTube channel. We hope you will join us in Washington D.C. in May to share your own data stories and join the csv,conf community!   Csv,conf,v5 is supported by the Sloan Foundation through OKFs Frictionless Data for Reproducible Research grant as well as by the Gordon and Betty Moore Foundation, and the Frictionless Data team is part of the conference committee. We are happy to answer all questions you may have or offer any clarifications if needed. Feel free to reach out to us on csv-conf-coord@googlegroups.com, on twitter @CSVconference or our dedicated community slack channel   We are committed to diversity and inclusion, and strive to be a supportive and welcoming environment to all attendees. To this end, we encourage you to read the Conference Code of Conduct.
Rojo the Comma Llama

While we won’t be flying Rojo the Comma Llama to DC for csv,conf,v5, we will have other mascot surprises in store.

Join #Hacktoberfest 2019 with Frictionless Data

- October 3, 2019 in Frictionless Data, hackathon

The Frictionless Data team is excited to participate in #Hacktoberfest 2019! Hacktoberfest is a month-long event where people from around the world contribute to open source software (and – you can win a t-shirt!). How does it work? All October, the Frictionless Data repositories will have issues ready for contributions from the open source community. These issues will be labeled with ‘Hacktoberfest’ so they can be easily found. Issues will range from beginner level to more advanced, so anyone who is interested can participate. Even if you’ve never contributed to Frictionless Data before, now is the time!  To begin, sign up on the official website (https://hacktoberfest.digitalocean.com) and then read the OKF project participation guidelines + code of conduct and coding standards. Then find an issue that interests you by searching through the issues on the main Frictionless libraries (found here) and also on our participating Tool Fund repositories here. Next, write some code to help fix the issue, and open a pull request for the Frictionless Team to review. Finally, celebrate your contribution to an open source project! We value and rely on our community, and are really excited to participate in this year’s #Hacktoberfest. If you get stuck or have questions, reach out to the team via our Gitter channel, or comment on an issue. Let’s get hacking!