You are browsing the archive for Sander van der Waal.

How to publish budget and spending data openly

- August 31, 2018 in Open Knowledge, Open Spending

At the Global Initiative for Fiscal Transparency (GIFT) and Open Knowledge International (OKI) we believe that governments’ budget and spending data should be made available to all, so that anyone can see how their tax money is spent,what priorities their governments make, and governments can be held accountable. Increasingly governments make their budget data already openly available, and that is really great to see. Civil society organisations, but also individual researchers, journalists, and anyone who is interested, can use this data to generate insights and share those with the public. But still much of the information is only available in PDF and other non-open formats, and not published as data. As a result, scrutinising and putting the data to use is difficult and requires a lot of work. GIFT and OKI have partnered to address this issue. Along with the BOOST World Bank initiative and a dedicated open data community, we developed the [Open] Fiscal Data Package. Its version 1.0 is now available! We built the OpenSpending portal on top of the [Open] Fiscal Data Package, to make it really easy to publish budget and spending data. Once it is up, a whole suite of tools are readily available to anyone to view, visualise and integrate the data.

How to get your budget and spending data in OpenSpending

There are two ways to make your data available via OpenSpending. The first is to manually upload the data using the OpenSpending Packager. If you have your fiscal data available as a CSV file, you can try it today. The packager will guide you through an intuitive process, which in a few easy steps means that your data can be accessed and visualised by anyone via the OpenSpending platform. If you have any questions about it, reach out on our forum or find us on our chatroom on Gitter. If you want to publish your data more regularly and automatically, we can help you by setting up what we call a pipeline. This is fairly technical process that we have trialled with the Mexican government. Because this is an automated process, it makes it easier longer term for governments to adopt this process. If you are interested in this, we would love to hear from you via openspending-support@okfn.org. An example of what it could look like to have your data published, is on Mexican transparency portal as you can see below:
OpenSpending integrated in the Mexican Transparency Portal. Get in touch with us to learn more about this process.

OpenSpending integrated in the Mexican Transparency Portal. Get in touch with us to learn more about this process.

Want to learn more? Join our webinar!

If you are a local, regional or national government interested in learning how you can benefit from OpenSpending, please join our webinar on 12 September at 10am EST (3pm BST / 4pm CEST). OKI’s Fiscal Transparency lead Sander van der Waal will present the Fiscal Data Package specification version 1 and the OpenSpending toolset. This is a great opportunity for government representatives to learn how they can work with us to get their data into OpenSpending. The webinar can be accessed here. Bookmark your calendar now! 
Join our OpenSpending webinar on 12 September

Join our OpenSpending webinar on 12 September

Do you have any questions? Please reach out to us via email on openspending-support@okfn.org. We would love to hear from you!

OKFestival 2018 becomes Open Knowledge Summit May 2018 in Thessaloniki

- November 30, 2017 in Events, Featured, OKFest, OKFest 2018, OKFestival, Open Knowledge Network

It is with regret that due to recent circumstances within Open Knowledge International, we have come to the conclusion that it is necessary to cancel the planned Open Knowledge Festival 2018. This announcement has been a very difficult decision for the team at OKI, however with such a short time frame ahead and a lack of secured funding for the Festival, we felt that we could not guarantee a successful event for all our participants. However, we want to take the opportunity to gather the Open Knowledge Network on the same dates, facilitating an event that embraces our network and is a better fit for where we are today. In this post we outline our alternative plans for Thessaloniki in May 2018.   Dear Network and partners: We are in a period of change at Open Knowledge International. After the resignation of our CEO Pavel Richter we took the time to reflect and think about the state of affairs. We recognise that we have not been as actively communicating as we should have been at Open Knowledge International. There have been some notable successes with initiatives such as Frictionless Data, the School of Data network, Open Data Day and the Global Open Data Index. It is very clear to us that those successes would not have been possible without the passion and commitment of our wider diverse Open Knowledge Network. Similar, if not greater, successes have been achieved by you in the network. Only two of these are the wonderful Prototype Fund project in Germany, and the MyData conference in Finland and Estonia. Full credit for that goes to Open Knowledge Foundation Deutschland and Open Knowledge Finland, only two of the amazing Chapters and groups that make up the Open Knowledge Network. But when we look in the mirror at Open Knowledge International we don’t think that we have excelled in the way that we intended over recent years. We did not communicate clearly to you, our partners, how our organisational strategy was evolving and where we were going as an organisation. We did not engage you, the Open Knowledge Network, on equal footing. We asked for your involvement and contributions at specific points but did not engage you sufficiently on the bigger question of the overall journey we are on together: aiming for juster and more open societies. We are very keen to change this. The world of open knowledge has grown, developed and matured in the last couple of years. Ten years ago we were mostly talking about governments publishing data openly. But now, our collective open knowledge universe includes many other areas like open access to academic publications and open research data. Many groups are actively involved in the area of personal data, where we citizens demand more control over the data we share with corporations on a day-by-day basis. We believe our vision is still very much valid: we still look ahead to a future where everyone has free and open access to key information, enabling every human, citizen, and consumer to understand and shape their lives, homes and the world. Our values are also as relevant as ever: open knowledge, as defined by the open definition, forms the cornerstone of what we do. We value respect and tolerance, collaboration not control. We are pragmatic, not fanatic, we make & talk, and we focus on making change in the world. The Open Knowledge Network is a lot more than just an aggregation of its parts. We know that we must keep these parts in constant relation. And we propose that the best way to do this is to keep the Network linked  through specific domains. This idea builds very much on the concept of Working Groups we have had for many years in the Open Knowledge Network. We have been inspired by the School of Data network who already work in this way – they develop their mission and align strategies between the various organisations that are members of the network – achieving great impact in data literacy in this way. We propose to develop thematic networks within the other domains that are important to multiple Open Knowledge groups. For example, based on a recent survey among Open Knowledge groups we think there is sufficient level of engagement around topics such as Fiscal Transparency, Open Data infrastructure, Open Science, OpenGLAM, and Personal Data. Such a network already exists in the area of data training with the School of Data network. There are opportunities, we believe, to benefit from more dedicated collaboration, exchange of ideas and plans and possibly even develop shared objectives in these areas among Open Knowledge partners. Unfortunately, due to the circumstances at Open Knowledge International, we are not in a position to organise the Open Knowledge Festival we envisioned, and that many of us fondly remember from Berlin (2014) and Helsinki (2012). Not going ahead with the Festival as planned is a very difficult decision for us. However we are keen to ensure that we hold an event that will be successful for the entire Network, taking the opportunity to gather the Open Knowledge Network on the same dates, to do something that is better fitting for where we are today. We are looking forward to bringing our partners together at an Open Knowledge Summit event in Thessaloniki in May 2018 that will help us all collaboratively build the future of the Open Knowledge Network. We will follow up with you, our partners in the Open Knowledge Network, over the next couple of weeks to work together on the idea of the domain networks that we started to outline above. We want to hear from you, if you feel this is the right approach for developing the Open Knowledge Network and incorporate your ideas. Together with you, we want to take the network to the next level in the build-up to May 2018, so that we can all come together in Thessaloniki as an opportunity to meet each other in person and work together within those domains that matter to us. More will follow on this in the next couple of weeks and months. Finally, we want to give a big shout-out to our partners at Open Knowledge Greece. We are very grateful for the hard work that they have put into making the event in Thessaloniki the best it can be, and we look forward to continuing our collaboration with them as amazing hosts for the Open Knowledge Summit.

OKFestival 2018 becomes Open Knowledge Summit May 2018 in Thessaloniki

- November 30, 2017 in Events, Featured, OKFest, OKFest 2018, OKFestival, Open Knowledge Network

It is with regret that due to recent circumstances within Open Knowledge International, we have come to the conclusion that it is necessary to cancel the planned Open Knowledge Festival 2018. This announcement has been a very difficult decision for the team at OKI, however with such a short time frame ahead and a lack of secured funding for the Festival, we felt that we could not guarantee a successful event for all our participants. However, we want to take the opportunity to gather the Open Knowledge Network on the same dates, facilitating an event that embraces our network and is a better fit for where we are today. In this post we outline our alternative plans for Thessaloniki in May 2018.   Dear Network and partners: We are in a period of change at Open Knowledge International. After the resignation of our CEO Pavel Richter we took the time to reflect and think about the state of affairs. We recognise that we have not been as actively communicating as we should have been at Open Knowledge International. There have been some notable successes with initiatives such as Frictionless Data, the School of Data network, Open Data Day and the Global Open Data Index. It is very clear to us that those successes would not have been possible without the passion and commitment of our wider diverse Open Knowledge Network. Similar, if not greater, successes have been achieved by you in the network. Only two of these are the wonderful Prototype Fund project in Germany, and the MyData conference in Finland and Estonia. Full credit for that goes to Open Knowledge Foundation Deutschland and Open Knowledge Finland, only two of the amazing Chapters and groups that make up the Open Knowledge Network. But when we look in the mirror at Open Knowledge International we don’t think that we have excelled in the way that we intended over recent years. We did not communicate clearly to you, our partners, how our organisational strategy was evolving and where we were going as an organisation. We did not engage you, the Open Knowledge Network, on equal footing. We asked for your involvement and contributions at specific points but did not engage you sufficiently on the bigger question of the overall journey we are on together: aiming for juster and more open societies. We are very keen to change this. The world of open knowledge has grown, developed and matured in the last couple of years. Ten years ago we were mostly talking about governments publishing data openly. But now, our collective open knowledge universe includes many other areas like open access to academic publications and open research data. Many groups are actively involved in the area of personal data, where we citizens demand more control over the data we share with corporations on a day-by-day basis. We believe our vision is still very much valid: we still look ahead to a future where everyone has free and open access to key information, enabling every human, citizen, and consumer to understand and shape their lives, homes and the world. Our values are also as relevant as ever: open knowledge, as defined by the open definition, forms the cornerstone of what we do. We value respect and tolerance, collaboration not control. We are pragmatic, not fanatic, we make & talk, and we focus on making change in the world. The Open Knowledge Network is a lot more than just an aggregation of its parts. We know that we must keep these parts in constant relation. And we propose that the best way to do this is to keep the Network linked  through specific domains. This idea builds very much on the concept of Working Groups we have had for many years in the Open Knowledge Network. We have been inspired by the School of Data network who already work in this way – they develop their mission and align strategies between the various organisations that are members of the network – achieving great impact in data literacy in this way. We propose to develop thematic networks within the other domains that are important to multiple Open Knowledge groups. For example, based on a recent survey among Open Knowledge groups we think there is sufficient level of engagement around topics such as Fiscal Transparency, Open Data infrastructure, Open Science, OpenGLAM, and Personal Data. Such a network already exists in the area of data training with the School of Data network. There are opportunities, we believe, to benefit from more dedicated collaboration, exchange of ideas and plans and possibly even develop shared objectives in these areas among Open Knowledge partners. Unfortunately, due to the circumstances at Open Knowledge International, we are not in a position to organise the Open Knowledge Festival we envisioned, and that many of us fondly remember from Berlin (2014) and Helsinki (2012). Not going ahead with the Festival as planned is a very difficult decision for us. However we are keen to ensure that we hold an event that will be successful for the entire Network, taking the opportunity to gather the Open Knowledge Network on the same dates, to do something that is better fitting for where we are today. We are looking forward to bringing our partners together at an Open Knowledge Summit event in Thessaloniki in May 2018 that will help us all collaboratively build the future of the Open Knowledge Network. We will follow up with you, our partners in the Open Knowledge Network, over the next couple of weeks to work together on the idea of the domain networks that we started to outline above. We want to hear from you, if you feel this is the right approach for developing the Open Knowledge Network and incorporate your ideas. Together with you, we want to take the network to the next level in the build-up to May 2018, so that we can all come together in Thessaloniki as an opportunity to meet each other in person and work together within those domains that matter to us. More will follow on this in the next couple of weeks and months. Finally, we want to give a big shout-out to our partners at Open Knowledge Greece. We are very grateful for the hard work that they have put into making the event in Thessaloniki the best it can be, and we look forward to continuing our collaboration with them as amazing hosts for the Open Knowledge Summit.

Public money? Public code!

- September 20, 2017 in Open Software

If taxpayers pay for something, they should have access to the results of the work they paid for. This seems a very logical basic premise that no-one would disagree with, but there are many cases of where this is not common practice. For example, in various countries Freedom of Information laws do not fully apply to cases where governments outsource services. This would prevent you from finding out how your tax money has been spent. Or think about the cost of access to academic outputs resulting from public money: while much of the university research is paid for by the public, the academic outputs are locked away in academic journals, university libraries pay a lot of money to have access to these outputs, and the general public has no access at all unless they pay up. But there is another important area where taxpayers’ money is used to lock away results. In our increasingly digitised societies, more and more software is being built by governments, or commissioned to external parties. The results of that work is in most cases proprietary software, which continues to be owned by the supplier. As a result, governments suffer from vendor lock-in, which means they rely fully on the external supplier for anything related to the software. No-one else is able to provide any adaptations or additions to the software, test the software properly to make sure there are no vulnerabilities, and the government cannot easily move to a different supplier if they are unhappy with the software provided. An easy solution for these issues exists: mandate that all software developed using public money is public code: stipulate in all contracts with external suppliers that the software they develop is released  under a Free and Open Source Software license. This issue forms the heart of the Public Code, Public Money campaign the Free Software Foundation Europe launched recently. The ultimate aim of the campaign is to make sure Free and Open Source Software will be the default option for publicly financed software everywhere. Open Knowledge International wholeheartedly supports this movement and we add our voice to the creed: If it is public money, it should be public code! Together with all signatories, we call on our representatives to take the necessary steps to require that publicly financed software developed for the public sector be made publicly available under a Free and Open Source Software licence. This topic is dear to us at Open Knowledge International. As the original developers and one of the main stewards of the CKAN software, we try to do our bit to make sure there is trustworthy, high quality open source software available for governments to deploy. CKAN is currently used by many governments worldwide (include the US, UK, Canada, Brasil, Germany, and Australia – to name a few) to publish data. As many governments have similar needs in publishing data on their websites, it would be a waste of public money if each government commissions the development of their own platform, or even pay a commercial supplier for a proprietary product. Because if a good open source solution is available, governments do not have to pay for license fees for the software: they use it for free. They can still contract an external company to deploy the open source software for them, and make any adaptations that they might want. But as long as these adaptations are also released as open source, the government is not tied to this one supplier – since the software is freely accessible they can easily take it to a different supplier if they’re unhappy. In practice though, this is not the case for most software in use by governments, and they continue to rely on suppliers for whom a vendor lock-in model is attractive. But we know change is possible. We have seen some successes in the last few years in the area of academic publishing, as the open access movement has gathered steam: increasingly funders of academic research stipulate that if you receive grants from them, you are expected to publish the result of this work under an open access license, which means that anyone can read and download their work. We hope a similar transformation is possible for publicly funded software, and we urge you all to add your signature to the campaign now!

Fostering open, inclusive, and respectful participation

- August 21, 2017 in community, network, Open Knowledge, Open Knowledge international Local Groups

At Open Knowledge International we have been involved with various projects with other civil society organisations aiming for the release of public interest data, so that anyone can use it for any purpose. More importantly, we focus on putting this data to use, to help it fulfil its potential of working towards fairer and more just societies. Over the last year, we started the first phase of the project Open Data for Tax Justice, because we and our partners believe the time is right to demand for more data to be made openly available to scrutinise the activities of businesses. In an increasingly globalised world, multinational corporations have tools and techniques to their disposal to minimise their overall tax bill, and many believe that this gives them an unfair advantage over ordinary citizens. Furthermore, the extent to which these practices take place is unknown, because taxes that multinational corporations pay in all jurisdictions in which they operate are not reported publicly. By changing that we can have a proper debate about whether the rules are fair, or whether changes will need to be made to share the tax bill in a different way. For us at Open Knowledge International, this is an entry into a new domain. We are not tax experts, but instead we rely on the expertise of our partners. We are open to engaging all experts to help shape and define together how data should be made available, and how it can be put to use to work towards tax systems that can rely on more trust from their citizens. Unsurprisingly, in such a complex and continuously developing field, debates can get very heated. People are obviously very passionate about this, and being passionate open data advocates ourselves, we sympathise. However, we think it is crucial that the passion to strive for a better world should never escalate to personal insults, ad-hominem attacks, or violate basic norms in any other way. Unfortunately, this happened recently with a collaborator on a project. While they made clear they were not affiliated with Open Knowledge International, nevertheless their actions reflected very badly on the overall project and we deeply condemn their actions. Moving forward, we want to make more explicitly clear what behaviour is and is not acceptable within the context of the projects we are part of. To that end, we are publishing project participation guidelines that make clear how we define acceptable and unacceptable behaviour, and what you can do if you feel any of these guidelines are being violated. We invite your feedback on these guidelines, as it is important that these norms are shared among our community. So please let us know on our Open Knowledge forum what you think and where you think these guidelines could be improved. Furthermore, we would like to make clear what the communities we are part of, like the one around tax justice, can expect from Open Knowledge International beyond enforcing the basic behavioural norms that we set out in the guidelines linked above. Being in the business of open data, we love facts and aim to record many facts in the databases we build. However, facts can be used to reach different and sometimes even conflicting conclusions. Some partners engage heavily on social media channels like Twitter to debate conflicting interpretations, and other partners choose different channels for their work. Open Knowledge International is not, and will never be, in a position to be the arbiter on all interpretations that partners make about the data that we publish. Our expertise is in building open databases, helping put the data to use, and convening communities around the work that we do. On the subject matter of, for example, tax justice, we are more similar to those of us who are interested and care about the topic, but would rely on the debate being led by experts in the field. Where we spot abuse of the data published in databases we run, or obvious misrepresentation of the data, we will speak out. But we will not monitor or take a stance on all issues that are being debated by our partners and the wider communities around our projects. Finally, we strongly believe that the open knowledge movement is best served by open and diverse participation. We aim for the project participation guidelines to spell out our expectations and hope these will help us move towards developing more inclusive and diverse communities, where everyone who wants to participate respectfully feels welcomed to do so. Do you think these guidelines are a step in the right direction? What else do you feel we should be doing at Open Knowledge International? We look forward to hearing from you in our forum.

Open data contributions welcomed at European Data Forum 2014

- December 9, 2013 in Uncategorized

The value of open data is increasing as more data is being released under an open license allowing anyone to freely use, reuse and redistribute it. This provides numerous opportunities for businesses and other organisations alike, and open data is becoming one of the pillars of Europe’s data economy. Open data is therefore one of the key themes at the European Data Forum 2014, which will take place 19-20 March 2014 in Athens, Greece. We invite you to submit your contributions now and are specifically interested in innovative applications of open data. The call for Contributions is open until 10 Dec 22:00 CET. Open data is increasingly becoming an integral part of governments’ agendas. In June, the G8 launched an Open data charter, committing to the ‘open by default’ principle. The European Commission released a document demonstrating how on the Commission implements this charter. Notable initiatives are the European Commission’s own open data portal and the portal publicdata.eu which aggregates data from governments’ open data portals throughout Europe. National open data portals are becoming more and more commonplace with notable examples such as the re-launched Swedish national open data portal oppnadata.se and the German national portal govdata.de. The value that open data can bring is demonstrated in various ways. Publication of budget and spending data makes governments more transparent and allows citizens and journalists to hold their governments to account. Projects like OpenSpending are helping turn the data into meaningful visualisations and stories so that people can make sense of the data. Economic opportunities for the data are being explored by companies such as Open Corporates and start-up companies such as Nostalgeo, who make innovative use of open mapping data to integrate images of old post cards with modern maps. We invite you to submit your contribution today. There is also a Call for Exhibition (CfE) in place and and we are offering interesting sponsoring bundles for organisations, projects and enterprises. We look forward to your submission!

Expanding publicdata.eu towards a more pan-European portal

- July 25, 2013 in Events

I was cordially invited to attend a meeting of the Public Sector Information sub-Group on the pan-European Open Data portal to present our work on publicdata.eu, the prototype of a pan-European open data portal that we are developing as part of LOD2 project. This was an excellent opportunity to talk with representatives of 19 EU across Europe that were at the meeting. It was great to see that many countries already have an open data portal or have just set one up, ranging from countries like Germany and France, to Austria and Slovakia. Some of these are already integrated in the pan-European publicdata.eu, but we also found out about portals that we were not aware of. My presentation focused on some of the key areas we are working on at the Open Knowledge Foundation.

Integrate more portals into publicdata.eu

As more official open data portals are being built, we are working to integrate those into publicdata.eu. It was interesting to see that countries are increasingly publishing their government data under open licenses, and that many were keen to have their data integrated in the pan-European portal that we are prototyping.

Increase use of the data

As part of the Apps for Europe project we are working with initiatives that stimulate innovative use of the data, such as via organising hackathons or app competitions. The goal of that project is to connect promising application developers to potential investors, and ensure that the data that is being published will be used successfully.

Work on standards for open data portals

The European Commission has funded the Open Data Support project to improve the visibility and facilitate the access to datasets published in open data portals. They initiated the working group that aims to develop a DCAT application profile for data portals in Europe. This will enable cross-data portal search for data sets and make public sector data better searchable across borders and sectors, in particular for data portals that are not based on the CKAN software that we develop and which runs publicdata.eu. It was a very fruitful day in Luxembourg and I returned with additional homework to further expand publicdata.eu with new open data portals from across Europe. Watch this space!

EC Consultation on open research data

- July 17, 2013 in Featured, Open Access, Open Data

The European Commission held a public consultation on open access to research data on July 2 in Brussels inviting statements from researchers, industry, funders, IT and data centre professionals, publishers and libraries. The inputs of these stakeholders will play some role in revising the Commission’s policy and are particularly important for the ongoing negotiations on the next big EU research programme Horizon 2020, where about 25-30 billion Euros would be available for academic research. Five questions formed the basis of the discussion:
  • How we can define research data and what types of research data should be open?
  • When and how does openness need to be limited?
  • How should the issue of data re-use be addressed?
  • Where should research data be stored and made accessible?
  • How can we enhance “data awareness” and a “culture of sharing”?
Here is how the Open Knowledge Foundation responded to the questions:

How can we define research data and what types of research data should be open?

Research data is extremely heterogeneous, and would include (although not be limited to) numerical data, textual records, images, audio and visual data, as well as custom-written software, other code underlying the research, and pre-analysis plans. Research data would also include metadata – data about the research data itself – including uncertainties and methodology, versioned software, standards and other tools. Metadata standards are discipline-specific, but to be considered ‘open’, at a bare minimum it would be expected to provide sufficient information that a fellow researcher in the same discipline would be able to interpret and reuse the data, as well as be itself openly available and machine-readable. Here, we are specifically concerned with data that is being produced, and therefore can be controlled by the researcher, as opposed to data the researcher may use that has been produced by others. When we talk about open research data, we are mostly concerned with data that is digital, or the digital representation of non-digital data. While primary research artifacts, such as fossils, have obvious and substantial value, the extent to which they can be ‘opened’ is not clear. However, the use of 3D scanning techniques can and should be used to enable the capture of many physical features or an image, enabling broad access to the artifact. This would benefit both researchers who are unable to travel to visit a physical object, as well as interested citizens who would typically be unable to access such an item. By default there should be an expectation that all types of research data that can be made public, including all metadata, should be made available in machine-readable form and open as per the Open Definition. This means the data resulting from public work is free for anyone to use, reuse and redistribute, with at most a requirement to attribute the original author(s) and/or share derivative works. It should be publicly available and licensed with this open license.

When and how does openness need to be limited?

The default position should be that research data should be made open in accordance with the Open Definition, as defined above. However, while access to research data is fundamentally democratising, there will be situations where the full data cannot be released; for instance for reasons of privacy. In these cases, researchers should share analysis under the least restrictive terms consistent with legal requirements, and abiding by the research ethics as dictated by the terms of research grant. This should include opening up non-sensitive data, summary data, metadata and code; and providing access to the original data available to those who can ensure that appropriate measures are in place to mitigate any risks. Access to research data should not be limited by the introduction of embargo periods, and arguments in support of embargo periods should be considered a reflection of inherent conservatism among some members of the academic community. Instead, the expectation should be that data is to be released before the project that funds the data production has been completed; and certainly no later than the publication of any research output resulting from it.

How should the issue of data re-use be addressed?

Data is only meaningfully open when it is available in a format and under an open license which allows re-use by others. But simply making data available is often not sufficient for reusing it. Metadata must be provided that provides sufficient documentation to enable other researchers to replicate empirical results. There is a role here for data publishers and repository managers to endeavour to make the data usable and discoverable by others. This can be by providing further documentation, the use of standard code lists, etc., as these all help make data more interoperable and reusable. Submission of the data to standard registries and use of common metadata also enable greater discoverability. Interoperability and the availability of data in machine-readable form are crucial to ensure data-mining and text-mining of the data can be performed, a form of re-use that must not be restricted. Arguments are sometimes made that we should monitor levels of data reuse, to allow us to dynamically determine which data sets should be retained. We refute this suggestion. There is a moral responsibility to preserve data created by taxpayer funds, including data that represents negative results or that is not obviously linked to publications. It is impossible to predict possible future uses, and reuse opportunities may currently exist that may not be immediately obvious. It is also crucial to note the research interests change over time.

Where should research data be stored and made accessible?

Each discipline needs different options available to store data and open it up to their community and the world; there is no one-size-fits-all solution. The research data infrastructure should be based on open source software and interoperable based on open standards. With these provisions we would encourage researchers to use the data repository that best fits their needs and expectations, for example an institutional or subject repository. It is crucial that appropriate metadata about the data deposited is stored as well, to ensure this data is discoverable and can be re-used more easily. Both the data and the metadata should be openly licensed. They should be deposited in machine-readable and open formats, similar to how the US government mandate this in their Executive Order on Government Information. This ensures the possibility to link repositories and data across various portals and makes it easier to find the data. For example, the open source data portal CKAN has been developed by the Open Knowledge Foundation, which enables the depositing of data and metadata and makes it easy to find and re-use data. Various universities, such as the Universities of Bristol and Lincoln, already use CKAN for these purposes.

How can we enhance data awareness and a culture of sharing?

Academics, research institutions, funders, and learned societies all have significant responsibilities in developing a culture of data sharing. Funding agencies and organisations disbursing public funds have a central role to play and must ensure research institutions, including publicly supported universities, have access to appropriate funds for longer-term data management. Furthermore, they should establish policies and mandates that support these principles. Publication and, more generally sharing, of research data should be ingrained in the academic culture, and should be seen as a fundamental part of scholarly communication. However, it is often seen as detrimental to a career, partly as a result of the current incentive system set up by by universities and funders, partly as a result of much misunderstanding of the issues. Educational and promotional activities should be set up to promote the awareness of open access to research data amongst researchers, to help disentangle the many myths, and to encourage them to self-identify as supporting open access. These activities should be set up in recognition of the fact that different disciplines are at different stages in the development of the culture of sharing. Simultaneously, universities and funders should explore options for creating incentives to encourage researchers to publish their research data openly. Acknowledgements of research funding, traditionally limited to publications, could be extended to research data and contribution of data curators should be recognised.

References

EC Consultation on open research data

- July 17, 2013 in Featured, Open Access, Open Data

The European Commission held a public consultation on open access to research data on July 2 in Brussels inviting statements from researchers, industry, funders, IT and data centre professionals, publishers and libraries. The inputs of these stakeholders will play some …

EC Consultation on open research data

- July 16, 2013 in Access to Information, Open Access, Open Data

The European Commission held a public consultation on open access to research data on July 2 in Brussels inviting statements from researchers, industry, funders, IT and data centre professionals, publishers and libraries. The inputs of these stakeholders will play some role in revising the Commission’s policy and are particularly important for the ongoing negotiations on the next big EU research programme Horizon 2020, where about 25-30 billion Euros would be available for academic research. Five questions formed the basis of the discussion:
  • How we can define research data and what types of research data should be open?
  • When and how does openness need to be limited?
  • How should the issue of data re-use be addressed?
  • Where should research data be stored and made accessible?
  • How can we enhance “data awareness” and a “culture of sharing”?
Here is how the Open Knowledge Foundation responded to the questions:

How can we define research data and what types of research data should be open?

Research data is extremely heterogeneous, and would include (although not be limited to) numerical data, textual records, images, audio and visual data, as well as custom-written software, other code underlying the research, and pre-analysis plans. Research data would also include metadata – data about the research data itself – including uncertainties and methodology, versioned software, standards and other tools. Metadata standards are discipline-specific, but to be considered ‘open’, at a bare minimum it would be expected to provide sufficient information that a fellow researcher in the same discipline would be able to interpret and reuse the data, as well as be itself openly available and machine-readable. Here, we are specifically concerned with data that is being produced, and therefore can be controlled by the researcher, as opposed to data the researcher may use that has been produced by others. When we talk about open research data, we are mostly concerned with data that is digital, or the digital representation of non-digital data. While primary research artifacts, such as fossils, have obvious and substantial value, the extent to which they can be ‘opened’ is not clear. However, the use of 3D scanning techniques can and should be used to enable the capture of many physical features or an image, enabling broad access to the artifact. This would benefit both researchers who are unable to travel to visit a physical object, as well as interested citizens who would typically be unable to access such an item. By default there should be an expectation that all types of research data that can be made public, including all metadata, should be made available in machine-readable form and open as per the Open Definition. This means the data resulting from public work is free for anyone to use, reuse and redistribute, with at most a requirement to attribute the original author(s) and/or share derivative works. It should be publicly available and licensed with this open license.

When and how does openness need to be limited?

The default position should be that research data should be made open in accordance with the Open Definition, as defined above. However, while access to research data is fundamentally democratising, there will be situations where the full data cannot be released; for instance for reasons of privacy. In these cases, researchers should share analysis under the least restrictive terms consistent with legal requirements, and abiding by the research ethics as dictated by the terms of research grant. This should include opening up non-sensitive data, summary data, metadata and code; and providing access to the original data available to those who can ensure that appropriate measures are in place to mitigate any risks. Access to research data should not be limited by the introduction of embargo periods, and arguments in support of embargo periods should be considered a reflection of inherent conservatism among some members of the academic community. Instead, the expectation should be that data is to be released before the project that funds the data production has been completed; and certainly no later than the publication of any research output resulting from it.

How should the issue of data re-use be addressed?

Data is only meaningfully open when it is available in a format and under an open license which allows re-use by others. But simply making data available is often not sufficient for reusing it. Metadata must be provided that provides sufficient documentation to enable other researchers to replicate empirical results. There is a role here for data publishers and repository managers to endeavour to make the data usable and discoverable by others. This can be by providing further documentation, the use of standard code lists, etc., as these all help make data more interoperable and reusable. Submission of the data to standard registries and use of common metadata also enable greater discoverability. Interoperability and the availability of data in machine-readable form are crucial to ensure data-mining and text-mining of the data can be performed, a form of re-use that must not be restricted. Arguments are sometimes made that we should monitor levels of data reuse, to allow us to dynamically determine which data sets should be retained. We refute this suggestion. There is a moral responsibility to preserve data created by taxpayer funds, including data that represents negative results or that is not obviously linked to publications. It is impossible to predict possible future uses, and reuse opportunities may currently exist that may not be immediately obvious. It is also crucial to note the research interests change over time.

Where should research data be stored and made accessible?

Each discipline needs different options available to store data and open it up to their community and the world; there is no one-size-fits-all solution. The research data infrastructure should be based on open source software and interoperable based on open standards. With these provisions we would encourage researchers to use the data repository that best fits their needs and expectations, for example an institutional or subject repository. It is crucial that appropriate metadata about the data deposited is stored as well, to ensure this data is discoverable and can be re-used more easily. Both the data and the metadata should be openly licensed. They should be deposited in machine-readable and open formats, similar to how the US government mandate this in their Executive Order on Government Information. This ensures the possibility to link repositories and data across various portals and makes it easier to find the data. For example, the open source data portal CKAN has been developed by the Open Knowledge Foundation, which enables the depositing of data and metadata and makes it easy to find and re-use data. Various universities, such as the Universities of Bristol and Lincoln, already use CKAN for these purposes.

How can we enhance data awareness and a culture of sharing?

Academics, research institutions, funders, and learned societies all have significant responsibilities in developing a culture of data sharing. Funding agencies and organisations disbursing public funds have a central role to play and must ensure research institutions, including publicly supported universities, have access to appropriate funds for longer-term data management. Furthermore, they should establish policies and mandates that support these principles. Publication and, more generally sharing, of research data should be ingrained in the academic culture, and should be seen as a fundamental part of scholarly communication. However, it is often seen as detrimental to a career, partly as a result of the current incentive system set up by by universities and funders, partly as a result of much misunderstanding of the issues. Educational and promotional activities should be set up to promote the awareness of open access to research data amongst researchers, to help disentangle the many myths, and to encourage them to self-identify as supporting open access. These activities should be set up in recognition of the fact that different disciplines are at different stages in the development of the culture of sharing. Simultaneously, universities and funders should explore options for creating incentives to encourage researchers to publish their research data openly. Acknowledgements of research funding, traditionally limited to publications, could be extended to research data and contribution of data curators should be recognised.

References