You are browsing the archive for Frictionless Data.

Frictionless DarwinCore Tool by André Heughebaert

- December 9, 2019 in Frictionless Data, Open Knowledge, Open Research, Open Science, Open Software, Technical

This blog is part of a series showcasing projects developed during the 2019 Frictionless Data Tool Fund.  The 2019 Frictionless Data Tool Fund provided four mini-grants of $5,000 to support individuals or organisations in developing an open tool for reproducible research built using the Frictionless Data specifications and software. This fund is part of the Frictionless Data for Reproducible Research project, which is funded by the Sloan Foundation. This project applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate reproducible data workflows in research contexts.   logo

Frictionless DarwinCore, developed by André Heughebaert

  André Heughebaert is an open biodiversity data advocate in his work and his free time. He is an IT Software Engineer at the Belgian Biodiversity Platform and is also the Belgian GBIF (Global Biodiversity Information Facility) Node manager. During this time, he has worked with the Darwin Core Standards and Open Biodiversity data on a daily basis. This work inspired him to apply for the Tool Fund, where he has developed a tool to convert DarwinCore Archives into Frictionless Data Packages.   The DarwinCore Archive (DwCA) is a standardised container for biodiversity data and metadata largely used amongst the GBIF community, which consists of more than 1,500 institutions around the world. The DwCA is used to publish biodiversity data about observations, collections specimens, species checklists and sampling events. However, this domain specific standard has some limitations, mainly the star schema (core table + extensions), rules that are sometimes too permissive, and a lack of controlled vocabularies for certain terms. These limitations encouraged André to investigate emerging open data standards. In 2016, he discovered Frictionless Data and published his first data package on historical data from 1815 Napoleonic Campaign of Belgium. He was then encouraged to create a tool that would, in part, build a bridge between these two open data ecosystems.   As a result, the Frictionless DarwinCore tool converts DwCA into Frictionless Data Packages, and also gives access to the vast Frictionless Data software ecosystem enabling constraints validation and support of a fully relational data schema.  Technically speaking, the tool is implemented as a Python library, and is exposed as a Command Line Interface. The tool automatically converts: project architecture   * DwCA data schema into datapackage.json * EML metadata into human readable markdown readme file * data files are converted when necessary, this is when default values are described The resulting zip file complies to both DarwinCore and Frictionless specifications.    André hopes that bridging the two standards will give an excellent opportunity for the GBIF community to provide open biodiversity data to a wider audience. He says this is also a good opportunity to discover the Frictionless Data specifications and assess their applicability to the biodiversity domain. In fact, on 9th October 2019, André presented the tool at a GBIF Global Nodes meeting. It was perceived by the nodes managers community as an exploratory and pioneering work. While the command line interface offers a simple user interface for non-programmers, others might prefer the more flexible and sophisticated Python API. André encourages anyone working with DarwinCore data, including all data publishers and data users of GBIF network, to try out the new tool. 
“I’m quite optimistic that the project will feed the necessary reflection on the evolution of our biodiversity standards and data flows.”

To get started, installation of the tool is done through a single pip install command (full directions can be found in the project README). Central to the tool is a table of DarwinCore terms linking a Data Package type, format and constraints for every DwC term. The tool can be used as CLI directly from your terminal window or as Python Library for developers. The tool can work with either locally stored or online DwCA. Once converted to Tabular DataPackage, the DwC data can then be ingested and further processed by software such as Goodtables, OpenRefine or any other Frictionless Data software. André has aspirations to take the Frictionless DarwinCore tool further by encapsulating the tool in a web-service that will directly deliver Goodtables reports from a DwCA, which will make it even more user friendly. Additional ideas for further improvement would be including an import pathway for DarwinCore data into Open Refine, which is a popular tool in the GBIF community. André’s long term hope is that the Data Package will become an optional format for data download on GBIF.org.  workflow Further reading: Repository: https://github.com/frictionlessdata/FrictionlessDarwinCore Project blog: https://andrejjh.github.io/fdwc.github.io/

Meet Lily Zhao, one of our Frictionless Data for Reproducible Research Fellows

- November 18, 2019 in Frictionless Data

The Frictionless Data for Reproducible Research Fellows Programme is training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows will learn about Frictionless Data, including how to use Frictionless Data tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. I am thrilled to be joining the Open Knowledge Foundation community as a Frictionless Data fellow. I am an interdisciplinary marine scientist getting my PhD at in the Ocean Recoveries Lab at the University of California Santa Barbara. I study how coral reefs, small-scale fisheries, and coastal communities are affected by environmental change and shifting market availability. In particular, I’m interested in how responsible, solutions-oriented science can help build resilience in these systems and improve coastal livelihoods. My current fieldwork is based in Mo’orea, French Polynesia. With an intricate tapestry of social dynamics and strong linkages between it’s terrestrial and marine environments, the island of Mo’orea is representative of the complexity of coral reef social-ecological systems globally. The reefs around Mo’orea are also some of the most highly studied in the world by scientists. In partnership with the University of French Polynesia and the Atiti’a Center, I recently interviewed local associations, community residents and the scientific community to determine how science conducted in Mo’orea can better serve residents of Mo’orea. One of our main findings is the need for increased access to the scientific process and open communication of scientific findings— both of which are tenets of an open science philosophy. I was introduced to open data science just a year ago as part of the Openscapes program– a Mozilla Firefox and National Center for Ecological Analysis and Synthesis initiative. Openscapes connected me to the world of open software and made me acutely aware of the pitfalls of doing things the way I had always done them. This experience made me excited to learn new skills and join the global effort towards reproducible research. With these goals in mind, I was eager to apply for the Frictionless Data Fellowship where I could learn and share new tools for data reproducibility. So far as a Frictionless Data Fellow, I have particularly enjoyed our conversations about “open” for whom? That is: who is open data science open for? And how can we push to increase inclusivity and access within this space?

A little bit about open data in the context of coral reef science

Coral reefs provide food, income, and coastal protection to over 500 million people worldwide. Yet globally, coral reefs are experiencing major disturbances, with many already past their ecological tipping points. Total coral cover (the abundance of coral seen on a reef) is the simplest and most highly used metric of coral resistance and recovery to climate change and local environmental stressors. However, to the detriment of coral reef research, there is not an open global database of coral cover data for researchers to build off of. The effort and money taken to conduct underwater surveys make coral cover data highly coveted and thus these data are often not available publicly. In the future, I hope to collaborate with researchers around the world to build an open, global database of coral cover data. Open datasets and tools, when used by other researchers, show promise in their ability to efficiently propel research forward. In other fields, open science has accelerated the rate of problem-solving and new discoveries. In the face of climate change, the ability to not reinvent the wheel with each new analysis can allow us to conduct reef resilience research at the speed with which coral reef degradation necessitates. Ultimately, I deeply believe that maintaining coral-dominated ecosystems will require: 1) amplification of the perspectives of coastal communities; and 2) open collaboration and data accessibility among scientists worldwide.

Frictionless Data for Reproducible Research Fellows Programme

More on Frictionless Data

The Fellows programme is part of the Frictionless Data for Reproducible Research project at Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. Frictionless Data’s other current projects include the Tool Fund, in which four grantees are developing open source tooling for reproducible research. The Fellows programme will be running until June 2020, and we will post updates to the programme as they progress. • Originally published at http://fellows.frictionlessdata.io/blog/hello-lily/

Meet Daniel Ouso, one of our Frictionless Data for Reproducible Research Fellows

- November 4, 2019 in Frictionless Data

The Frictionless Data for Reproducible Research Fellows Programme is training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows will learn about Frictionless Data, including how to use Frictionless Data tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. You can call me Daniel Ouso. My roots trace to the lake basin county of Homabay in the Equatorial country in the east of Africa; Kenya. Currently, I live in its capital Nairobi – once known as “The Green City in the Sun”, although thanks to the poor stewardship to Mother Nature this is now debatable. The name is Maasai for a place of cool waters. But enough of beautiful Kenya. I work in the International Centre of Insect Physiology and Ecology as a Bioinformatics expert within the Bioinformatics Unit involved in bioinformatics training and genomic data management. I am a master of science in Molecular biology and Bioinformatics (2019) from Jomo Kenyatta University of Agriculture and Technology, Kenya. My previous work is in infectious disease management and a bit of conservation. My long-term interest is in disease genomics research. I am passionate about research openness and reproducibility, which I gladly noticed as a common interest in the Frictionless Data Fellowship (FDF). I have had previous experience working on a Mozilla Open Science project that really piqued my interest in wanting to learn skills and to expand my knowledge and perspective in the area. To that destination, this fellowship advertised itself as the best vehicle, and it was a frictionless decision to board. My goal is to become a better champion for open-reproducible research by learning data and metadata specifications for interoperability, the associated programmes/libraries/packages and data management best practices. Moreover, I hope to discover additional resources, to network and exchange with peers, and ultimately share the knowledge and skills acquired. Knowledge is cumulative and progressive, an infinite cycle, akin to a corn plant, which grows into a seed from a seed, in between helped by the effort of the farmer and other factors. Whether or not the subsequent seed will be replanted depends, among other competitions, on its quality. You may wonder where I am going with this, so here is the point: for knowledge to bear it must be shared promiscuously; to be verified and to be built upon. The rate of research output is very fast, and so is the need for advancement of the research findings. However, the conclusions may at times be wrong. To improve knowledge, the goal of research is to deepen understanding and confirm findings and claims through reproduction. However, this is dependent on the contribution of many people from diverse places, as such, there is an obvious need to remove or minimise obstacles to the quest for research excellence. As a researcher, I believe that to keep with the rate of research production, findings and data from it must be made available in a form that doesn’t antagonise its re-use or/and validation for further research. It means reducing friction on the research wheel by making research easier, cheaper and quicker to conduct, which will increase collaboration and prevent the reinvention of the wheel. To realise this, it is incumbent on me (and others) to make my contribution both as a producer and an affected party, especially seeing that exponentially huge amounts of biological data continue to be produced. Simply, improving research reproducibility is the right science of this age. I am a member of The Carpentries community as an instructor and currently also in the task force planning the CarpentryCon2020, and hope to meet some of OKF community members there. I am excited to join this community as a Frictionless Data Fellowship! You can find important links and follow my fellowship here.

Frictionless Data for Reproducible Research Fellows Programme

More on Frictionless Data

The Fellows programme is part of the Frictionless Data for Reproducible Research project at Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. Frictionless Data’s other current projects include the Tool Fund, in which four grantees are developing open source tooling for reproducible research. The Fellows programme will be running until June 2020, and we will post updates to the programme as they progress. • Originally published at http://fellows.frictionlessdata.io/blog/hello-ouso/

Meet Sele Yang, one of our Frictionless Data for Reproducible Research Fellows

- October 29, 2019 in Frictionless Data

The Frictionless Data for Reproducible Research Fellows Programme is training early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows will learn about Frictionless Data, including how to use Frictionless Data tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. I’m Selene Yang, an unapologetic feminist, a map lover and a social communications researcher. I was born in Costa Rica, but my family is from Nicaragua and I’m half Taiwanese. I also live and work in Paraguay and did my PhD in Argentina. A little confusing, right? I’m also a Human Rights advocate working at a digital rights defender organization called TEDIC, and I’m part of the Research Center for Communications and Public Policies of the National University of La Plata. I also have a cat. At this moment I’m currently working on my dissertation project that grapples with the process of gathering, editing and curating open geospatial data through Volunteer Geographic Information (VGI) in OpenStreetmap (OSM); my research also looks at the interrelationship between mapped objects and the gender of the mappers, to understand its consequence on the perception, use and appropriation of space. My main research question is: What relevance does the representation of space have for women in relation to the practices of data creation, collection, curatorship and visibility? This project has it’s practice based with Geochicas, a group of more than 200 women mappers from 22 different countries in 5 continents who are determined to close the gender gap in the geo-communities through safe learning spaces, community based analysis and data visualisation projects. How did I get here and why is being a frictionless data fellow so important to me? Throughout my career, I have been concerned with how to generate fruitful collaborations between social and data sciences. I believe such collaborations can produce a more equal and broad access to knowledge within the global south. Even though the use of data for the social sciences can often be seen as quantitative and objective instruments, I consider it necessary to find new and creative ways in which findings can be shared, analysed and reproduced in ways in which knowledge can operate fluidly, and as a result of this we can bridge the gaps in the inequality of knowledge production. As a result of this fellowship, I want to understand if there’s an unseen, un-analysed relationship between the mappers’ gender and the objects mapped by them, and how we can better manage potentially gender biased data structures. I consider it of great importance to generate practices, methodologies and concepts that can be useful to create and strengthen an academic community where the culture of openness, diversity and inclusion are the founding bases of knowledge production.

En español

Soy Selene Yang, feminista, amante de los mapas e investigadora en comunicación social. Nací en Costa Rica, pero mi familia es de Nicaragua y soy mitad taiwanesa. Hoy en día vivo y trabajo en Paraguay, y realicé mi doctorado en Comunicación en Argentina. ¿Medio confuso, no? También soy defensora de los derechos humanos y trabajo en TEDIC, una organización enfocada en defender los derechos digitales. También soy parte del Centro de Investigación en Comunicación y Políticas Públicas de la Universidad Nacional de La Plata. También tengo un gato. Actualmente me encuentro trabajando en terminar mi proyecto de disertación doctoral, el cual se enfoca en el análisis sobre la recolección, edición y curaduría de datos geoespaciales abiertos desde la Información Geográfica Voluntaria (VIG por sus siglas en inglés) en OpenStreetmap. También, desde mi investigación, busco conectar las relaciones que existen entre los objetos mapeados y el género de las personas que los mapean, para consecuentemente entender la percepción, el uso y la apropiación de los espacios públicos para las mujeres. La pregunta principal de mi investigación es: ¿Qué relevancia tiene la representación del espacio para las mujeres en relación a las prácticas de creación, recolección, curaduría y visualización de datos? Esta pregunta se inscribe en las prácticas de la colectiva Geochicas, un grupo de más de 200 mujeres mapeadoras, de 22 países distintos, en 5 continentes, quienes están comprometidas a cerrar la brecha de género en las geo-comunidades a través de la creación de espacios de aprendizaje seguros, análisis comunitarios y proyectos de visualización de datos. ¿Cómo llegué acá, y por qué este programa es tan importante para mi? Durante mi carrera, he buscado formas en las cuales se puedan generar vínculos y colaboraciones entre las ciencias sociales y las ciencias de datos. Considero que este tipo de alianzas interdisciplinarias son fundamentales para generar un conocimiento más accesible y equitativo dentro del sur global. A pesar de que las ciencias sociales consideren los datos como instrumentos objetivos y meramente cuantificables, creo que es necesario encontrar creatividad en las nuevas formas en las que los hallazgos de las investigaciones puedan ser compartidas, analizadas y reproducidas para que puedan operar fluidamente, y de esta forma poder cerrar las brechas que existen en la producción de conocimiento. Como resultado de este programa, quisiera entender si es que existe una relación no vista y tampoco analizada entre el gńero de las personas que mapean y los objetos geográficos en relación a las experiencias espaciales, para encontrar mejores formas de manejar las potenciales estructuras de datos sesgadas. Considero de gran importancia generar prácticas, metodologías y conceptos que puedan aportar a la creación y fortalecimiento de una comunidad académica donde la cultura de la apertura, diversidad e inclusividad sean las bases de la producción de saberes.

Frictionless Data for Reproducible Research Fellows Programme

More on Frictionless Data

The Fellows programme is part of the Frictionless Data for Reproducible Research project at Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. Frictionless Data’s other current projects include the Tool Fund, in which four grantees are developing open source tooling for reproducible research. The Fellows programme will be running until June 2020, and we will post updates to the programme as they progress. • Originally published at http://fellows.frictionlessdata.io/blog/hello-sele/

csv,conf returns for version 5 in May

- October 15, 2019 in #CSVconf, Events, Frictionless Data, News, Open Data, Open Government Data, Open Research, Open Science, Open Software

Save the data for csv,conf,v5! The fifth version of csv,conf will be held at the University of California, Washington Center in Washington DC, USA, on May 13 and 14, 2020.    If you are passionate about data and its application to society, this is the conference for you. Submissions for session proposals for 25-minute talk slots are open until February 7, 2020, and we encourage talks about how you are using data in an interesting way (like to uncover a crossword puzzle scandal). We will be opening ticket sales soon, and you can stay updated by following our Twitter account @CSVconference.   csv,conf is a community conference that is about more than just comma-sepatated-values – it brings together a diverse group to discuss data topics including data sharing, data ethics, and data analysis from the worlds of science, journalism, government, and open source. Over two days, attendees will have the opportunity to hear about ongoing work, share skills, exchange ideas (and stickers!) and kickstart collaborations.   
csv,conf,v4

Attendees of csv,conf,v4

First launched in July 2014,  csv,conf has expanded to bring together over 700 participants from 30 countries with backgrounds from varied disciplines. If you’ve missed the earlier years’ conferences, you can watch previous talks on topics like data ethics, open source technology, data journalism, open internet, and open science on our YouTube channel. We hope you will join us in Washington D.C. in May to share your own data stories and join the csv,conf community!   Csv,conf,v5 is supported by the Sloan Foundation through OKFs Frictionless Data for Reproducible Research grant as well as by the Gordon and Betty Moore Foundation, and the Frictionless Data team is part of the conference committee. We are happy to answer all questions you may have or offer any clarifications if needed. Feel free to reach out to us on csv-conf-coord@googlegroups.com, on twitter @CSVconference or our dedicated community slack channel   We are committed to diversity and inclusion, and strive to be a supportive and welcoming environment to all attendees. To this end, we encourage you to read the Conference Code of Conduct.
Rojo the Comma Llama

While we won’t be flying Rojo the Comma Llama to DC for csv,conf,v5, we will have other mascot surprises in store.

Join #Hacktoberfest 2019 with Frictionless Data

- October 3, 2019 in Frictionless Data, hackathon

The Frictionless Data team is excited to participate in #Hacktoberfest 2019! Hacktoberfest is a month-long event where people from around the world contribute to open source software (and – you can win a t-shirt!). How does it work? All October, the Frictionless Data repositories will have issues ready for contributions from the open source community. These issues will be labeled with ‘Hacktoberfest’ so they can be easily found. Issues will range from beginner level to more advanced, so anyone who is interested can participate. Even if you’ve never contributed to Frictionless Data before, now is the time!  To begin, sign up on the official website (https://hacktoberfest.digitalocean.com) and then read the OKF project participation guidelines + code of conduct and coding standards. Then find an issue that interests you by searching through the issues on the main Frictionless libraries (found here) and also on our participating Tool Fund repositories here. Next, write some code to help fix the issue, and open a pull request for the Frictionless Team to review. Finally, celebrate your contribution to an open source project! We value and rely on our community, and are really excited to participate in this year’s #Hacktoberfest. If you get stuck or have questions, reach out to the team via our Gitter channel, or comment on an issue. Let’s get hacking!

A recap of the 2019 eLife Innovation Sprint

- September 26, 2019 in Events, Frictionless Data, Open Science

Over 36 hours, Jo Barratt and Lilly Winfree from Open Knowledge Foundation’s Frictionless Data team joined 60 people from around the world to develop innovative solutions to open science obstacles at the 2019 eLife Innovation Sprint. This quick, collaborative event in Cambridge, UK, on September 4th and 5th brought together designers, scientists, coders, project managers, and communications experts to develop their budding ideas into functional prototypes. Projects focused on all aspects of open science, including but not limited to improving scientific publishing, data management, and increasing diversity, equity, and inclusion. Both Jo and Lilly pitched projects and thoroughly enjoyed working with their teams on these projects.  Lilly pitched creating an open science game that could be used to teach scientists about open best practices in a fun and informative way. Read on to learn more about these projects, and their experiences at the Sprint. Jo proposed making a podcast documenting the Sprint experience, projects, and people aiming to that would be fully produced and edited and publish the piece during the Sprint.  Lilly’s inspiration to create an open science game came from her experience at Force11 in 2018, where she played a game about FAIR data (Findable, Accessible, Interoperable, and Reusable). She realized that playing a game can be a great way to learn about a subject that might otherwise seem dry, and creating a game prototype seemed like a fun, accessible, and achievable goal for the Sprint. The open science game team formed with eight people from diverse backgrounds, including a game designer, board game enthusiasts, publishers, and scientists. This mix of backgrounds was a big asset to the team, and played a large role in the development of a functional game prototype. To start designing the game, the team first decided that the goal of the game should be to teach scientists about open science best practices, while the collaborative goal for the players would be to make an important scientific discovery – like curing a disease. The team crafted the storyline of the game, and finally worked on the game play mechanics. In the end, the game was made for 2-5 players and ideally would take about 30-45 minutes to play. To play, each player gets a role card — Lab Principal Investigator, Graduate Student, Data Management Librarian, Teaching Assistant, and Data Scientist. Each of these roles has personas and attributes that impact the game. For instance, the Principal Investigator has negative attributes that make sharing research openly harder, while the Teaching Assistant has positive attributes that make it easier to teach new tools to other players. On each turn, the players can draw research object cards or tool cards that help advance the game, but might also draw an event card, which can have positive of negative effects on the gameplay. The ultimate goal is for the players to share their research findings, which requires the player to draw and “research” an insight card and it’s related methods card, data collection card, and analysis card. The game ends once enough research findings are shared (either openly or with restricted access). A fun and interesting part of the game is that the players can role play their characters and see how attitudes towards open science differ and how those attitudes affect the progression of science. Hint: to win the game, the players have to cooperate with each other and openly share at least some of their research findings. The team is currently digitising the game so others can play it – keep track of their progress on their GitHub Repository.
“My team was fantastic to work with. I came to the Sprint with a basic idea and a hope that we could create a fun, educational game on open science, but my team really ran with the idea and created a game that is so much more than I had hoped for!” – Lilly Winfree, OKF

OKF delivery manager, Jo Barratt, brought his storytelling talents to the forefront for the eLife Sprint by proposing the creation of a podcast to document the people and ideas at the Sprint. Jo has produced many podcasts over the years, and thought the podcast format would offer a unique perspective into the inner workings of the Sprint. He was delighted to have two other Sprint members join his Podcast team: Hannah Drury and Elsa Loissel from eLife. Neither Hannah nor Elsa had worked on a podcast before, but both were eager and quick learners. Their project started with Jo giving Hannah and Elsa quick lessons on interviewing, using recording equipment, editing and sound design. Jo was really excited to have such collaborative team members to work with, which was very in line with the synergistic spirit of the Sprint. To capture a feel for the essence of the Sprint, Hannah and Elsa began by interviewing most Sprint members, asking them questions like about their backgrounds and what they hoped to get out of the sprint. Interviewees were also asked to give their views on what ‘open science’ means to them. Next, the team interviewed several projects for a more in depth discussion into how the Sprint works and what types of projects were being developed. In the final podcast, there are interviews with the teams from the open science game project, one on equitable preprints, the project looking at computational training best practices, and the high performance computing in Africa team. Each of these segments shows the people, methods, and progress of the projects, highlighting the diverse people and ideas at the Sprint and giving listeners insight into the process of this type of event as well as many of the problems that face the open science community. Jo’s highlight of the podcast was a conversation between current Innovation officer at eLife, Emmy Tsang, and the past officer, Naomi Penfold. They discussed their experiences hosting the Sprint, and to commented on changes they have witnessed in the open science movement. Listeners to the podcast will notice the overarching themes of openness, collaboration, excitement, and hope for the future of science, while also being challenged to think about who is being left behind in the progress towards a more open world. You can hear the full podcast (and see pictures from the Sprint) here, or listen on Soundcloud here.
“I supported them but really this was made by two scientists who had zero experience in this and I think making this in 2 days is really quite impressive!” – Jo Barratt, OKF
The OKF team would like to thank Emmy and eLife for a great experience at the Sprint!

Part of the Open Knowledge Foundation team met up in Cambridge the day before the Sprint began, and saved the world from a meteor (at an escape room)!

A halfway point update from the 2019 Frictionless Data Tool Fund

- September 25, 2019 in Featured, Frictionless Data, tool fund

In June 2019, we launched the Frictionless Data Tool Fund to facilitate reproducible data workflows in research contexts. Our four Tool Fund grantees are now at the halfway point of their projects, and have made great progress. Read on to learn more about these projects, their next steps, and how you can also contribute.

Stephan Max: Data Package tools for Google Sheets

Stephan’s Tool Fund work is focused on creating an add-on for Google Sheets to allow for Data Package import and export. With this tool, researchers (and other data wranglers) that use Google Sheets will be able to quickly and easily incorporate Data Packages into their existing data processing workflows. Recently, Stephan created a prototype that you can test at the project’s GitHub Repo by following the steps outlined in the README file: https://github.com/frictionlessdata/googlesheets-datapackage-tools. Next steps for Stephan’s project include enhancing the user interface, and adding additional information such as licensing options for the export button. If you try the prototype, please leave Stephan feedback as an issue in the repository.

João Peschanski and team: Neuroscience Experiments System (NES)

To improve the way neuroscience experimental data and metadata is shared, João and the team at the Research, Innovation and Dissemination Center for Neuromathematics (RIDC NeuroMat) are working on implementing Data Packages into their Neuroscience Experiments System (NES). NES is an open-source tool for data collection that stores large amounts of data in a structured way. This tool aims to assist neuroscience research laboratories in routine experimental procedures. During the Tool Fund, João and team have created a Data Package exportation module from within NES that reflects the Frictionless specifications for data and metadata interoperability. This export includes a JSON file descriptor (a datapackage.json file) with information related to how the experiment was performed, with a goal of increasing reproducibility. Next steps for the team include more testing and gathering feedback, and then a public release. The NES GitHub repository can be seen here: https://github.com/neuromat/nes.

André Heughebaert: DarwinCore Archive Data Package support

Inspired by his work with the Global Biodiversity Information Facility (GBIF), André is converting DarwinCore Archives into Data Packages for his Tool Fund project. The DarwinCore is a standard describing biological diversity that is intended to increase interoperability of biological data. André has recently completed a first release of the tool, which appends datapackage.json and README.md files containing the data descriptors and human readable metadata to the DarwinCore archive. This release supports all standard DarwinCore terms, and has been tested with several use cases. You can read more about Frictionless DarwinCore and see all of the use cases André tested for the beta release in the repo’s README file. If you want to test or contribute to this Tool Fund project, please open an issue in the repository.

Shelby Switzer and Greg Bloom: Open Referral Human Services data package support

Shelby’s Tool Fund work is building out datapackage support for Open Referral’s Human Service Data Specification (HSDS) and Human Service Data API Suite (HSDA). Open Referral develops data standards and open source tools for health, human, and social services. For the Tool Fund, Shelby has been developing on their HSDS-Transformer, which takes raw data, transforms it to HSDS format, and then packages it as a datapackage within a zip file, so users can work with tidily packaged data. For example, Shelby and the Open Referral team have been working with 2-1-1 in Miami-Dade, Florida, to help transform and share their resource directory database with their partners in a more sustainable fashion. Next steps for Shelby include creating a UI for their HSDS-Transformer so that anyone can access HSDS-compliant datapackages. Shelby will also be contributing to the improvement of the datapackage Ruby gem during this project.

Frictionless Data at the EPFL Open Science in Practice Summer School

- September 16, 2019 in Featured, Frictionless Data, Open Science

In early September our Frictionless Data for Reproducible Research product manager, Lilly Winfree, presented a workshop at the Open Science in Practice Summer School at EPFL University in Lausanne, Switzerland.  Lilly’s workshop focused on teaching early career researchers about using Frictionless software and specs to make their research data more interoperable, shareable, and open. The audience learned about metadata, data schemas, creating data packages, and validating their data with Goodtables. The slides for her workshop are available here, and are licensed as CC-BY-4.0. The Summer School was organized by Luc Henry, Scientific Advisor at EPFL, and was a week-long series of talks and workshops on open science best practices for research students and early career researchers. A highlight of the workshop for Lilly was having the opportunity to work with Oleg Lavrovsky in person. Oleg is on the board of the Swizz chapter of OKF, Opendata.ch, and created the Frictionless Data Julia libraries as a Tool Fund grantee two years ago. Oleg wrote a recap of the workshop, which we are republishing below. The original can be read here. Thanks for your help, Oleg, and for Luc for organizing!

“Open” is the new black. Everybody talks about open science. But what does it mean exactly?

Lilly Winfree of the Frictionless Data for Reproducible Research project at OKF ran a workshop at Open Science in Practice, a week long training organized by the EPFL with Eurotech Universities. It was a top grade workshop delivered to a diverse room of doctoral students, early career researchers, “and beyond” in Lausanne. I had the opportunity to assist her, and learn from her professional delivery, get up to speed with key points about Open Knowledge Foundation, the latest news from the small, diligent people working to make open data more accessible and useful. With a fascinating science background, she connected well with the audience and made a strong case for well published open research data. The workshop reignited my desire to continue publishing Data Packages, contribute to the project, develop better support in various software environments, and be present in community channels. In our conversation afterwards, we talked about the remote work culture and global reach of the team, expectations management, and the challenges ahead. Thanks very much to @heluc and the rest of the #OSIP2019 team for organizing a great event, to all who participated in the workshop for patiently and interestedly hacking their first Data Packages together, and kudos to Lilly for crossing distances to bridge gaps and support Open Science in Switzerland.

Next events

There are two upcoming events that Oleg is involved with that might be of interest to the Frictionless Data and OKF communities: the DINAcon Digital Sustainability Conference, on October 18 in Bern, and the Opendata.ch Tourism Hackathon on November 29 in Lucerne.

A warm welcome to our Frictionless Data for Reproducible Research Fellows

- August 29, 2019 in Featured, Frictionless Data, Open Science

As part of our commitment to opening up scientific knowledge, we recently launched the Frictionless Data for Reproducible Research Fellows Programme, which will run from mid-September until June 2020.  We received over 200 impressive applications for the Programme, and are very excited to introduce the four selected Fellows:
  • Monica Granados, a Mitacs Canadian Science Policy Fellow; 
  • Selene Yang, a graduate student researcher at the National University of La Plata, Argentina; 
  • Daniel Ouso, a postgraduate researcher at the International Centre of Insect Physiology and Ecology; 
  • Lily Zhao, a graduate student researcher at the University of California, Santa Barbara. 
Next month, the Fellows will be writing blogs to further introduce themselves to the Frictionless Data community, so stay tuned to learn more about these impressive researchers. The Programme will train early career researchers to become champions of the Frictionless Data tools and approaches in their field. Fellows will learn about Frictionless Data, including how to use Frictionless tools in their domains to improve reproducible research workflows, and how to advocate for open science. Working closely with the Frictionless Data team, Fellows will lead training workshops at conferences, host events at universities and in labs, and write blogs and other communications content. As the programme progresses, we will be sharing the Fellows’ work on making research more reproducible with the Frictionless Data software suite by posting a series of blogs here and on the Fellows website. In June 2020, the Programme will culminate in a community call where all Fellows will present what they have learned over the nine months: we encourage attendance by our community. If you are interested in learning more about the Programme, the syllabus, lessons, and resources are open.

More About Frictionless Data

The Fellows Programme is part of the Frictionless Data for Reproducible Research project at Open Knowledge Foundation. This project, funded by the Sloan Foundation, applies our work in Frictionless Data to data-driven research disciplines, in order to facilitate data workflows in research contexts. Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. Frictionless Data’s other current projects include the Tool Fund, in which four grantees are developing open source tooling for reproducible research. The Fellows Programme will be running until June 2020, and we will post updates to the Programme as they progress.