You are browsing the archive for open research data.

Open data and research: Let’s get to it!

- June 1, 2018 in El Salvador, Open Data Day, open data day 2018, open research data

This blog has been translated from the original post at This blog is part of the event report series on International Open Data Day 2018. On Saturday 3 March, groups from around the world organised over 400 events to celebrate, promote and spread the use of open data. 45 events received additional support through the Open Knowledge International mini-grants scheme, funded by Hivos, SPARC, Mapbox, the Hewlett Foundation and the UK Foreign & Commonwealth Office. The event in this blog was supported through the mini-grants scheme under the Open Research Data theme. Every year we celebrate opening, promoting using, reusing, disseminating and creating value from open data. This is a simple action that has a great impact for knowledge generation and opportunities for economic, social and cultural development in countries. Since 2014, in DatosElSalvador we have promoted open data with the vision of contributing to more people and organizations benefiting from open data to generate commercial, social and cultural opportunities. In 2015 we worked with the Transparency Consortium to celebrate Open Data Day for transparency. In 2016, along with the comptroller agencies we hosted Open Data Day to tackle corruption, again with the support of the Consortium. In 2017 we promoted open data for entrepreneurship. This year, DatosElSalvador, The Next Services and Hub170 joined together to celebrate Open Data Day in el Salvador, with the support of Open Knowledge Interntional we built a unique space to share and experiment with open data and its benefits for research and analysis. Along with university researchers and civil society organisations, we had a morning full of knowledge and experiences around open research data, definition of roadmaps and technical knowledge to open data and make them available for everyone. We learned about visualization tools, information processing, Creative Commons licensing and especially: opening data! During the day, we learned about examples of open research data and open academic data, we went through examples of data reuse and learned practically how to open them and visualize them. Every since we started with DatosElSalvador we have tried to use Open Data Day to empower a sector of society with 3 key goals:
  1. More people and organisations opening data! We anxiously wait for the day when strategic sectors of the country’s social, economic and cultural sectors of the country open, reuse and share data to generate a rich, certain, evidence-based and constant knowledge.
  2. More open data! While in DatosElSalvador we work to open data through our portal, we love receiving and spread data from more people and organisations. That’s why when we make data available during Open Data Day we had the goal of feeding the portal and explaining participants how they can use the data available.
  3. More tools to open data! We love it when people talk about open data, but we love even more when they put them to use and learn about tools to use them. That’s wy during the activity we went through different tools and learned to use them.
“Data based research generates more concrete evidence for decision-making” was our motto for Open Data Day in El Salvador. Along with three universities we opened data about economy, education, elections and research, and we organized a panel about the challenges of academic and scientific research and open data. DatosElSalvador is the only open data portal in El Salvador, and we are committed to continue opening data for a community that generates value through their research work. This event allowed us to identify the new challenges we face as promoters of open data in the country. On one hand we need to foster more meetings to learn about tools and techniques for opening data, as well as good practices for reuse. On the other, we need to encourage data based research by having incentives and/or recommendations for public policies or the generation of more business models. When we defined the topic for this Open Data Day event, we did this with building a baseline in the community on how to create and grow capacities and use the collective inteligence to continue this valuable process of generating knowledge with universities. We had some strategic allies. Each year we find a topic, community and allies that get together to generate valuable events that allow participants to learn things, not just to know them. The Next Services shared and assessed on techonologies to open and visualise data, and the Hub170 taught us about creative thought and soft skills necessary to build work teams that can do research and create value. We can’t wait to have a new Open Data Day. In the meantime we renovate our commitment to make of open something valuable!

Evidence Appraisal Data-Thon: A recap of our Open Data Day event

- May 23, 2018 in health, Open Data Day, open data day 2018, Open Research, open research data, Open Science

This blog has been reposted from Medium This blog is part of the event report series on International Open Data Day 2018. On Saturday 3 March, groups from around the world organised over 400 events to celebrate, promote and spread the use of open data. 45 events received additional support through the Open Knowledge International mini-grants scheme, funded by Hivos, SPARC, Mapbox, the Hewlett Foundation and the UK Foreign & Commonwealth Office. The events in this blog were supported through the mini-grants scheme under the Open Research Data theme.

Research can save lives, reduce suffering, and help with scientific understanding. But research can also be unethical, unimportant, invalid, or poorly reported. These issues can harm health, waste scientific and health resources, and reduce trust in science. Differentiating good science from bad, therefore, has big implications. This is happening in the midst of broader discussions about differentiating good information from misinformation. Current controversy regarding political ‘fake news’ has specifically received significant recent attention. Public scientific misinformation and academic scientific misinformation also are published, much of it derived from low quality science.

EvidenceBase is a global, informal, voluntary organization aimed at boosting and starting tools and infrastructure that enhance scientific quality and usability. The critical appraisal of science is one of many mechanisms seeking to evaluate and clarify published science, and evidence appraisal is a key area of EvidenceBase’s work. On March 3rd we held an Open Data Day event to introduce the public to evidence appraisal and to explore and work on an open dataset of appraisals. We reached out to a network in NYC of data scientists, software developers, public health professionals, and clinicians and invited them and their interested friends (including any without health, science, or data training).


Our data came from the US’s National Library of Medicine’s PubMed and PubMed Central datasets. PubMed offers indexing, meta-data, and abstracts for biomedical publications and PubMed Central (PMC) offers full-text in pdf and/or xml. PMC has an open-access subset. We explored the portion of this subset that 1) was indexed in PubMed as a “journal comment” and 2) was a comment on a clinical trial. The structure of our 10 hour event was an initial session introducing the general areas of health trials, research issues, and open data and then the remainder of the day consisted of parallel groups tackling three areas: lay exploration and Q&A; dataset processing and word embedding development; and health expertise-guided manual exploration and annotation of comments. We had 2 data scientists, 4 trial experts, 3 physicians, 4 public health practitioners, 4 participants without background but with curiosity, and 1 infant. Our space was donated, and the food was provided from a mix of a grant from Open Data Day provided by SPARC and Open Knowledge International (thank you!) and voluntary participant donations.

On the dataset front, we leveraged the clinical trial and journal comment meta-data in PubMed, and the links between PubMed and PMC, and PMC’s open subset IDs to create a data subset that was solely journal comments on clinical trials that were in PMC’s open subset with xml data. Initial exploration of this subset for quality issues showed us that PubMed metadata tags misindex non-trials as trials and non-comments as comments. Further data curation will be needed. We did use it to create word embeddings and so some brief similarity-based expansion.


The domain experts reviewed trials in their area of expertise. Some participants manually extracted text fragments expressing a single appraisal assertion, and attempted to generalize the assertion for future structured knowledge representation work. Overall participants had a fun, productive, and educational time! From the standpoint of EvidenceBase, the event was a success and was interesting. We are mainly virtual and global, so this in person event was new for us, energizing, and helped forge new relationships for the future.

We also learned:

  • We can’t have too much on one person’s plate for logistics and for facilitation. Issues will happen (e.g. food cancellation last minute).
  • Curiosity abounds, and people are thirsty for meaningful and productive social interactions beyond their jobs. They just need to be invited, otherwise this potential group will not be involved.
  • Many people who have data science skills have jobs in industries they don’t love, they have a particular thirst to leverage their skills for good.
  • People without data science expertise but who have domain expertise are keen on exploring the data and offering insight. This can help make sense of it, and can help identify issues (e.g. data quality issues, synonyms, subfield-specific differences).
  • People with neither domain expertise nor data science skills still add vibrancy to these events, though the event organizers need more bandwidth to help orient and facilitate the involvement of these attendees.
  • Public research data sets are messy, and often require further subsetting or transformation to make them usable and high quality.
  • Open data might have license and accessibility barriers. For us, this resulted in a large reduction in journal comments with full-text vs. not, and of those with full-text, a further large reduction in those where the text was open-access and licensed for use in text mining.

We’ll be continuing to develop the data set and annotations started here, and we look forward to the next Open Data Day. We may even host a data event before then!

Open Data Day: From entrepreneurship to open science

- April 26, 2018 in mexico, Open Data Day, open data day 2018, open research data, Open Science, spain

Authors: Virginia De Pablo (ODI Madrid) and Karla Ramos (Epicentro Inefable A.C.) This blog is part of the event report series on International Open Data Day 2018. On Saturday 3 March, groups from around the world organised over 400 events to celebrate, promote and spread the use of open data. 45 events received additional support through the Open Knowledge International mini-grants scheme, funded by Hivos, SPARC, Mapbox, the Hewlett Foundation and the UK Foreign & Commonwealth Office. The events in this blog were supported through the mini-grants scheme under the Open Research Data theme. For the last edition of Open Data Day, two very different cities Madrid (Spain) and Puebla (México) have joined efforts to demonstrate that open data is an essential tool for social development. We could see this in the sessions that took place that day, where students, journalists, political scientists, technologists and public servants gathered to prove that open data is useful to center the future of research and science, as well as building bridges between citizens and decision makers.


During Open Data Day in Puebla, Epicentro Inefable AC and the State Coordinator for Transparency and Open Government (CETGA for its Spanish initials), along with the Engineering faculty of the  Benemérita Universidad Autónoma de Puebla organized the Open Data Day Puebla Bootcamp, with the goal of disseminating the benefits of data in open formats. During the welcome, we called teachers, students and people in general, to use the data that the government of Puebla publishes openly. We also mentioned that open data can be a bridge between government and people, and it works to generate better public policies and strengthen civic participation for decision making for social good. We had presentations for students of different public universities in Puebla by Karla Ramos, the director of Epicentro Inefable A.C.; Boris Cuapio and Hugo Osorio, founders and partners of Gobierno Fácil; Tony Rojas, director of Open Government of the CETGA; Juan Carlos Espinosa, youth ambassador of My World Mexico, and Luis Oidor, chief of the Open Government department in the CETGA.   In the panel “Morning Data, what is open data and what do they work for?”, the presenters highlighted the qualities that open data should have, like being free and of easy access. They also emphasized their usefulness as a digital tool that every person can use as a source of information, to improve the quality of life in their community. During his participation, Hugo Osorio highlighted that open data can be used as a tool for entrepreneurship. For example, he mentioned that apps like Waze and Uber use open data y for 2013 the generated more than 920 million USD in the US. To close the session, Luis oidor presented the actions that the government of Puebla is implementing to train, train public officers for publication of new data sets. He mentioned that up to now, 91% of the agencies and 81% of the municipalities have received training in this subject. As a result, they have published 416 data sets in topics like health, education, transportation, finance, employment, business, security and service delivery, which can be accessed through As a final activity, we navigated through the datasets available in the government portal, where 100 students and teachers participated in 20 different teams. Hugo, Boris and Karla were in charge of grading the results of the 12 questions we asked during the event and named the winners. The BootCamp took place in the University’s auditorium, we gathered 271 students and teachers from the BUAP, the Instituto Tecnológico Superior de San Martín, el Instituto de Estudios Superiores A.C., el Instituto Tecnológico Superior de Atlixco, el Instituto de Capacitación para el Trabajo del Estado de Puebla y el Colegio de Estudios Científicos y Tecnológicos del Estado de Puebla, as well as participants from civil society organizations.


Open Data Day in Madrid was focused on Open Science. For two days -March 2 and 3- we gathered a distinguished group of professionals and students of many disciplines in Medialab Prado. The participants participated in the sessions organized by the Ontology Engineering Group (OEG), ODI Madrid and Datalab. Among the speakers we had David Abián, from Wikimedia Spain, María Poveda, from the Ontology Engineering Group (OEG) and ODI Madrid; Mariano Rico, a member of the OEG, responsible of explaining the use and utility of the DBPedia; Olga Giraldo, who presente “SMART protocols for Open Science”, and Fernando Blat, from Populate. Bastien Guerry, from the Office of the Prime Minister of France, in charge of maintaining the org-mode software org-mode closed the day. During the morning, David Abián taught us how to extract data from Wikimedia in order to do any research that might interest us. He explained the formats in which we can obtain and generate information in this wiki and taught us through a simple practical exercise: extract data about a specific topic: nuclear plants. As we went through, he explained what this information could be useful for. He made clear how open data can be used from scientific research, open science to writing journalistic papers or information for policy decisions. Maria Poveda explained what ontologies are for. She did this through a light chat that allowed us to understand how to develop them and how we can use them in the open data context. After the lunch break, Olga Giraldo presented the keynote, a chat about open science entitled “SMART protocols for Open Science”. She allowed us to know how, since when and why we gather and publish scientific data. “Data by itself doesn’t explain its use” Giraldo said. The researcher insisted that data should go “along with a document -lab protocol- where we can explain how we get to the data and how we can use them”. The importance of protocols and their content lies in its design and accessibility, two keys to find scientific data and the information you might need. Her work on the SMART protocols platform, where researchers can publish their protocols, besides gathering other information is a sample of this. Afterwards, Mariano Rico told us about the DBpedia del español: how they got their data, how it’s edited, how they’re downloaded, how many datasets it has, when it started to function, etc. DBpedia contains an immense information repository, a full set of structured data that makes it the center of a world of data that has been edited with controlled vocabularies. This is, without question, a link between many vocabularies and a useful tool for all kinds of solutions, from visualizations to apps, whether for scientific ends, industrial ends or any type of business. Finally, Bastien Guerry outlined the work he does leading org-mode and his work as editor and responsible person of it working for the French government.        

Open Research in the Philippines: The Lessons and Challenges

- April 24, 2018 in Open Data Day, open data day 2018, open research data, Open Science, philippines

Authors: Czarina Medina-Guce and Marco Angelo S. Zaplan This blog is part of the event report series on International Open Data Day 2018. On Saturday 3 March, groups from around the world organised over 400 events to celebrate, promote and spread the use of open data. 45 events received additional support through the Open Knowledge International mini-grants scheme, funded by Hivos, SPARC, Mapbox, the Hewlett Foundation and the UK Foreign & Commonwealth Office. The events in this blog were supported through the mini-grants scheme under the Open Research Data theme. Pioneering discussions on open research in a country where data management is still in the works can be rewarding yet challenging. In celebration of global Open Data Day,  the Institute for Leadership, Empowerment, and Democracy (iLEAD) and Datos.PH initiated small group discussions on March 3, 2018. The organizations, while taking different tracks, have fielded the same question, how can we make data and literature more open for the research community? Gathering over twenty representatives from the academe, government agencies, civil society organizations, and research institutions, iLEAD embarked on a stocktaking exercise to assess the current research landscape in the Philippines. Datos.PH, on the other hand, organized a data hackathon with researchers and students, with the aim of making national datasets more disaggregated and gendered to enable analysis of datasets at the regional level.

The iLEAD Team with the participants of the Open Data Day: Roundtable on Open Research in the Philippines last March 3, 2018 in Quezon City, Philippines

Differences and Similarities

Both events steered towards the goal of widening the access of the citizens, knowledge producers, advocates, and other infomediaries to data, research materials, and literature. For this, two approaches were used. Datos.PH’s hackathon involved time running and analyzing datasets while iLEAD’s event involved discussions with resource speakers from government and university libraries. Datos.PH’s hackathon brought together a small and focused group of technical data users, in this case, Statistics major students, to crunch data, disaggregate national datasets, and bring out gender data analysis into the open. The goal of each session was to disaggregate datasets by region and sex of the respondents. Once disaggregated, breakout groups presented initial statistical analysis of disaggregated datasets. iLEAD’s Roundtable Discussion engaged data users and suppliers to delve into the opportunities and barriers on open research. While the two initiatives produced different outputs, both have concluded that the current data landscape is still a long stretch from fully reaching various  stakeholders.

A student crunching data during Datos.PH’s ODD event in Quezon City, Philippines

Lessons Learned

iLEAD was able to surface issues and concerns in opening up research from its initiated exercise. While there are significant strides in opening government data from the previous years, there are still challenges in making the programs genuinely usable and relevant for different publics. On the side of the government, the biggest gap still lies on the issue of legal frameworks in information sharing and accessing such as the long-standing contentions on the country’s Data Privacy Act and the absence of a Freedom of Information (FOI) law that will expand the scope of government information disclosure to subnational levels and other branches of the government. There is also low use of data made available for the public, too, which suggests a disconnect between the data that are being disclosed and the data needs and demands of the people. In the academic contexts, similar issues surfaced as existing practices in opening research products (books, journals, and other reports) are bound by Intellectual Property (IP) policies. Strict academic sharing practices coupled with inhibitions from some contributors hinder open information exchange among researchers, advocates, and other knowledge producers. There are also financial barriers. Academic institutions have to pay for steep collation and subscription fees in acquiring access to academic journals and databases. Digitizing and improving information systems of libraries also incur significant costs, which many schools usually find difficult to finance if they do not have the resources. Meanwhile, Datos.PH’s workshop sessions worked on national datasets including the family income and expenditure survey (FIES), labor force survey (LFS), and annual poverty indicators survey (APIS).  It was able to develop a simple manual, which provides users ways to use the disaggregated data as a means to sustain the practice long after the workshop. The manual includes analysis and questions local policymakers, researchers, and advocates may ask using the data. By the end of the data dive, Datos.PH managed to put together a draft manual and fifty-four disaggregated datasets coming from three datasets. Datos.PH’s event learnings boils down to this: there is so much data available yet even the most technical users have little access to it. Some did not even have idea about the existing datasets the Philippine Statistical Authority (PSA) produces. This is surprising given that primary users of datasets are the statisticians themselves. Moving forward, demand for these datasets needs to catch up. This is to provide more cases to induce disclosures and production of data for public use. There is so much to be done for open research. While there are financial, legal, and technical barriers that need to be overcome, these discussions are a step towards building a community that shares the same advocacy of making data in the Philippines accessible and usable for all Filipinos.   The Institute for Leadership, Empowerment, and Democracy (iLEAD) (  is a non-stock, non-profit think tank consultancy and resource center that focuses on strategic policy work to strengthen democratic institutions. Datos.PH is a nonprofit organization working towards building capacities of stakeholders and advocating for data for evidence-based public policies at the local level. Both are based in the Philippines.

Open Data Day 2018 in Ethiopia and Nigeria

- April 5, 2018 in Open Data Day, open data day 2018, open research data, Open Science

Authors: Bolutife Adisa (Open Switch Africa) and Solomon Mekonnen (Open Knowledge Ethiopia) This blog is part of the event report series on International Open Data Day 2018. On Saturday 3 March, groups from around the world organised over 400 events to celebrate, promote and spread the use of open data. 45 events received additional support through the Open Knowledge International mini-grants scheme, funded by Hivos, SPARC, Mapbox, the Hewlett Foundation and the UK Foreign & Commonwealth Office. The events in this blog were supported through the mini-grants scheme under the Open Science & Open Research Data theme. Two notable events were held in celebration of Open Data Day 2018 in Africa. In Addis Ababa, Ethiopia, Open Knowledge Ethiopia with the support of Addis Ababa University, Open Knowledge International and SPARC hosted OpenCon 2018 Addis Ababa (a satellite event of the global annual OpenCon meeting). The event brought together 25 participants including students, researchers, academics and Librarians. In Nigeria, Open Data Day 2018 Lagos was organized by Open Switch Africa in partnership with Open Knowledge International, SPARC and the University of Lagos Science Students Association. This event was to further improve and sensitize the community of students researchers, advocates and academics on open data and the adoption of Open Education Resources in Nigeria.

Ethiopia: OpenCon 2018 Addis Abeba

The Open Data Day event in Ethiopia was officially opened by Mr. Mesfin Gezahegn, University Librarian of Addis Ababa University (AAU). He stressed that AAU strongly supports open data initiatives and hosted various workshops and trainings in open data, open access and open science. He also promised that the University will continue supporting open data initiatives in the future. Following the opening, Mr. Solomon Mekonnen of AAU introduced participants to open science concepts with sharing international initiatives related to open science that can also be applied in Ethiopia. The next talk was about open research data by Dr. Melkamu Beyene, Assistant professor at AAU, focusing on the advantages of opening research data and issues to be considered when sharing data. The final presentation for the morning session was by Mr. Mesfin Gezahegn on the role of open data to fight corruption in Ethiopia.

Clockwise from top left: Mesfin Gezahegn, Dr. Mikamu Beyene, panel discussion, Solomon Mekkonen

Following the presentations, a panel discussion was conducted mainly focusing on open science and open research data. Major issues raised included licensing options when sharing research data, policy for open research data, creating awareness on open research data and open science and the role of open data communities in pushing forward the agenda of open science. In the afternoon session, several open science tools were demonstrated to the participants, including Zenodo, re3data, ORCID and the Open Science Framework (OSF). There was also a session for lighting talks which attracted two graduate students to talk about their research projects and get feedback from the participants. Postgraduate students Yemaneberhan Lemma and Olyad Fekede talked about their project on Linked data and Sentiment Analysis respectively connecting with open data. Mr. Michael Melese who is a PhD student at AAU also shared his experience on open science tools to the participants. Finally the event was concluded by a discussion on future activities. It was agreed that the event successfully created awareness on open science and open research data, but it was stressed that there is a need for longer training on these topics to PhD students and early career researchers. It was also suggested that there should be monthly open knowledge Ethiopia community meetups to collaborate on open science issues that are raised in the workshop. The only challenge faced during the workshop, was power failure in the computer laboratory for some time in the middle of running the practical sessions but the participants use their personal laptop and smartphones to continue practicing on open science tools.

Nigeria: Open Data Day 2018 Lagos

The Open Data event in Lagos started with a keynote on open data from Dr Ahmed Ogunlaja, Executive Director of Open Access Nigeria, where he exposed participants to the importance of Open Education Resources, open research data and the current state of openness initiatives in Nigeria. Afterwards, a workshop on open data tools and Creative Commons licenses was led by Mr Kayode Yussuf, tech lead Creative Commons Nigeria. He took attendees through the various open data tools and discussed success stories in the Nigerian open data space. He gave an overview of portals like Wikidata and the Open Science Framework (OSF) and explained all the available open licenses that exist under the Creative Commons platform. The event rounded up with a panel session where both speakers were joined by Mr Adisa Bolutife, Co-founder of Open Switch Africa, to hold an interactive discussion and question and answer segment with the attendees. Community members were able to share their challenges and difficulties in working in the open sphere and were given valuable advice from the members of the panel. Some of the valuable feedback given by the participants were the need for a unified portal for accessing open data in Nigeria, and the need for the Nigerian community to step up to the challenge of making data open in form of linked data. The total number of attendees at the Lagos event was 220, including students, researchers, academics and other professionals who were new to the open data space. It was a successful event and the purpose of the event was achieved because valuable feedback was recorded from the open community which was later presented to the Director of policy and planning at the Federal Ministry of Communications Technology on the 19th of March, 2018 at an Open Data roundtable organised by the Nigerian Government in Lagos, Nigeria. The community hopes to continually update itself on recent open data policy steps that are being taken by the government and periodically share best practices. Both communities were able to achieve their aim in hosting the events. The major similarity between both events was the emphasis on open data tools and its benefits.