You are browsing the archive for Laura James.

Open Data Privacy

- December 13, 2013 in Featured

“yes, the government should open other people’s data”

Traditionally, the Open Knowledge Foundation has worked to open non-personal data – things like publicly-funded research papers, government spending data, and so on. Where individual data was a part of some shared dataset, such as a census, great amounts of thought and effort had gone in to ensuring that individual privacy was protected and that the aggregate data released was a shared, communal asset. But times change. Increasing amounts of data are collected by governments and corporations, vast quantities of it about individuals (whether or not they realise that it is happening). The risks to privacy through data collection and sharing are probably greater than they have ever been. Data analytics – whether of “big “ or “small” data – has the potential to provide unprecedented insight; however some of that insight may be at the cost of personal privacy, as separate datasets are connected/correlated. Medical data loss dress Both open data and big data are hot topics right now, and at such times it is tempting for organisations to get involved in such topics without necessarily thinking through all the issues. The intersection of big data and open data is somewhat worrying, as the temptation to combine the economic benefits of open data with the current growth potential of big data may lead to privacy concerns being disregarded. Privacy International are right to draw attention to this in their recent article on data for development, but of course other domains are affected too. Today, we’d like to suggest some terms to help the growing discussion about open data and privacy. Our Data is data with no personal element, and a clear sense of shared ownership. Some examples would be where the buses run in my city, what the government decides to spend my tax money on, how the national census is structured and the aggregate data resulting from it. At the Open Knowledge Foundation, our default position is that our data should be open data – it is a shared asset we can and should all benefit from. My Data is information about me personally, where I am identified in some way, regardless of who collects it. It should not be made open or public by others without my direct permission – but it should be “open” to me (I should have access to data about me in a useable form, and the right to share it myself, however I wish if I choose to do so). Transformed Data is information about individuals, where some effort has been made to anonymise or aggregate the data to remove individually identified elements. big-data_conew1 We propose that there should be some clear steps which need to be followed to confirm whether transformed data can be published openly as our data. A set of privacy principles for open data, setting out considerations that need to be made, would be a good start. These might include things like consulting key stakeholders including representatives of whatever group(s) the data is about and data privacy experts around how the data is transformed. For some datasets, it may not prove possible to transform them sufficiently such that a reasonable level of privacy can be maintained for citizens; these datasets simply should not be opened up. For others, it may be that further work on transformation is needed to achieve an acceptable standard of privacy before the data is fit to be released openly. Ensuring the risks are considered and managed before data release is essential. If the transformations provide sufficient privacy for the individuals concerned, and the principles have been adhered to, the data can be released as open data. We note that some of “our data” will have personal elements. For instance, members of parliament have made a positive choice to enter the public sphere, and some information about them is therefore necessarily available to citizens. Data of this type should still be considered against the principles of open data privacy we propose before publication, although the standards compared against may be different given the public interest. This is part of a series of posts exploring the areas of open data and privacy, which we feel is a very important issue. If you are interested in these matters, or would like to help develop privacy principles for open data, join the working group mailing list. We’d welcome suggestions and thoughts on the mailing list or in the comments below, or talk to us and the Open Rights Group, who we are working with, at the Open Knowledge Conference and other events this autumn.

The Open Definition in context: putting open into practice

- October 16, 2013 in Featured, linked-open-data, Open Data, Open Definition, Open Knowledge Definition, Open Standards

We’ve seen how the Open Definition can apply to data and content of many types published by many different kinds of organisation. Here we set out how the Definition relates to specific principles of openness, and to definitions and guidelines for different kinds of open data.

Why we need more than a Definition

The Open Definition does only one thing: as clearly and concisely as possible it defines the conditions for a piece of information to be considered ‘open’. The Definition is broad and universal: it is a key unifying concept which provides a common understanding across the diverse groups and projects in the open knowledge movement. At the same time, the Open Definition doesn’t provide in-depth guidance for those publishing information in specific areas, so detailed advice and principles for opening specific types of information – from government data, to scientific research, to the digital holdings of cultural heritage institutions – is needed alongside it. For example, the Open Definition doesn’t specify whether data should be timely; and yet this is a great idea for many data types. It doesn’t make sense to ask whether census data from a century ago is “timely” or not though! Guidelines for how to open up information in one domain can’t always be straightforwardly reapplied in another, so principles and guidelines for openness targeted at particular kinds of data, written specifically for the types of organisation that might be publishing them, are important. These sit alongside the Open Definition and help people in all kinds of data fields to appreciate and share open information, and we explain some examples here.

Principles for Open Government Data

In 2007 a group of open government advocates met to develop a set of principles for open government data, which became the “8 Principles of Open Government Data”. In 2010, the Sunlight Foundation revised and built upon this initial set with their Ten Principles for Opening up Government Information, which have set the standard for open government information around the world. These principles may apply to other kinds of data publisher too, but they are specifically designed for open government, and implementation guidance and support is focused on this domain. The principles share many of the key aspects of the Open Definition, but include additional requirements and guidance specific to government information and the ways it is published and used. The Sunlight principles cover the following areas: completeness, primacy, timeliness, ease of physical and electronic access, machine readability, non-discrimination, use of commonly owned standards, licensing, permanence, and usage costs.

Tim Berners-Lee’s 5 Stars for Linked Data

In 2010, Web Inventor Tim Berners-Lee created his 5 Stars for Linked Data, which aims to encourage more people to publish as Linked Data – that is using a particular set of technical standards and technologies for making information interoperable and interlinked. The first three stars (legal openness, machine readability, and non-proprietary format) are covered by the Open Definition, and the two additional stars add the Linked Data components (in the form of RDF, a technical specification). The 5 stars have been influential in various parts of the open data community, especially those interested in the semantic web and the vision of a web of data, although there are many other ways to connect data together.

Principles for specific kinds of information

At the Open Knowledge Foundation many of our Working Groups have been involved with others in creating principles for various types of open data and fields of work with an open element. Such principles frame the work of their communities, set out best practice as well as legal, regulatory and technical standards for openness and data, and have been endorsed by many leading people and organisations in each field. These include:

The Open Definition: the key principle powering the Global Open Knowledge Movement

All kinds of individuals and organisations can open up information: government, public sector bodies, researchers, corporations, universities, NGOs, startups, charities, community groups, individuals and more. That information can be in many formats – it may be spreadsheets, databases, images, texts, linked data, and more; and it can be information from any field imaginable – such as transport, science, products, education, sustainability, maps, legislation, libraries, economics, culture, development, business, design, finance and more. Each of these organisations, kinds of information, and the people who are involved in preparing and publishing the information, has its own unique requirements, challenges, and questions. Principles and guidelines (plus training materials, technical standards and so on!) to support open data activities in each area are essential, so those involved can understand and respond to the specific obstacles, challenges and opportunities for opening up information. Creating and maintaining these is a major activity for many of the Open Knowledge Foundation’s Working Groups as well as other groups and communities. At the same time, those working on openness in many different areas – whether open government, open access, open science, open design, or open culture – have shared interests and goals, and the principles and guidelines for some different data types can and do share many common elements, whilst being tailored to the specific requirements of their communities. The Open Definition provides the key principle which connects all these groups in the global open knowledge movement.

More about openness coming soon

Don’t miss our other posts about Defining Open Data, and exploring the Open Definition, why having a shared and agreed definition of open data is so important, and how one can go about “doing open data”.

Exploring openness and the Open Definition

- October 7, 2013 in Featured, Open Data, Open Definition, Open Knowledge Definition

We’ve set out the basics of what open data means, so here we explore the Open Definition in more detail, including the importance of bulk access to open information, commercial use of open data, machine-readability, and what conditions can be imposed by a data provider.

Commercial Use

A key element of the definition is that commercial use of open data is allowed – there should be no restrictions on commercial or for-profit use of open data. In the full Open Definition, this is included as “No Discrimination Against Fields of Endeavor — The license must not restrict anyone from making use of the work in a specific field of endeavor. For example, it may not restrict the work from being used in a business, or from being used for genetic research.” The major intention of this clause is to prohibit license traps that prevent open material from being used commercially; we want commercial users to join our community, not feel excluded from it.

Examples of commercial open data business models

It may seem odd that companies can make money from open data. Business models in this area are still being invented and explored but here are a couple of options to help illustrate why commercial use is a vital aspect of openness. open data buttons You can use an open data set to create a high capacity, reliable API which others can access and build apps and websites with, and to charge for access to that API – as long as a free bulk download is also available. (An API is a way for different pieces of software or different computers to connect and exchange information; most applications and apps use APIs to access data via the internet, such as the latest news or maps or prices for products.) Businesses can also offer services around data improvement and cleaning; for example, taking several sets of open data, combining them and enhancing them (by creating consistent naming for items within the data, say, or connecting two different datasets to generate new insights). (Note that charging for data licensing is not an option here – charging for access to the data means it is not open data! This business model is often talked about in the context of personal information or datasets which have been compiled by a business. These are perfectly fine business models for data but they aren’t open data.)

Attribution, “Integrity” and Share-alike

Whilst the Open Definition permits very few conditions to be placed on how someone can use open data it does allow a few specific exceptions:
  • Attribution: an open data provider may require attribution (. that you credit them in an appropriate way). This can be important in allowing open data providers to receive credit for their work, and for downstream users to know where data came from.
  • Integrity: an open data provider may require that a user of the data makes it clear if the data has been changed. This can be very relevant for governments, for example, who wish to ensure that people do not claim data is official if it has been modified.
  • Share-alike: an open data provider may impose a share-alike licence, requiring that any new datasets created using their data are also shared as open data.

Machine-readability and bulk access

Data can be provided in many ways, and this can have a significant impact on how easy it is to use it. The Open Definition requires that data be both machine-readable and available in “bulk” to help make sure it’s not too difficult to make useful. Data is machine-readable if it can be easily processed by a computer. This does not just mean that it’s digital, but that it is in a digital structure that is appropriate for the relevant processing. For example, consider a PDF document containing tables of data. These are digital, but computers will struggle to extract the information from the PDF (even though it is very human readable!). The equivalent tables in a format such as a spreadsheet would be machine-readable. Read more about machine-readability in the open data glossary. Some machine readable data being read by a machine Data is available in bulk if you can download or access the whole dataset easily. It is not available in bulk if you are you limited to just getting parts of the dataset, for example, if you are restricted to getting just a few elements of the data at a time – imagine for example trying to access a dataset of all the towns in the world one country at a time.

APIs versus Bulk

Providing data through an API is great – and often more convenient for many of the things one might want to do with data than bulk access, such as presenting some useful information in a mobile app. However, the Open Definition requires bulk access rather than an API. There are two main reasons for this:
  • Bulk access allows you to build an API (if you want to!). If you need all the data, using an API to get it can be difficult or inefficient. For example, think about Twitter: using their API to download all the tweets would be very hard and slow. Thus, bulk access is the only way to guarantee full access to the data for everyone. Once bulk access is available, anyone else can build an API which will help others use the data. You can also use bulk data to create interesting new things such as search indexes and complex visualisations.
  • Bulk access is significantly cheaper than providing an API. Today you can store gigabytes of data for less than a dollar a month; but running even a basic API can cost much more, and running a proper API that supports high demand can be very expensive.
So having an API is not a requirement for data to be open – although of course it is great if one is available. Moreover, it is perfectly fine for someone to charge for access to open data through an API – as long as they also provide the data for free in bulk. (Strictly speaking, the requirement isn’t that the bulk data is available for free but that the charge is no more than the extra cost of reproduction. For online downloads, that’s very close to free!) This makes sense: open data must be free but open data services (such as an API) can be charged for. (It’s worth considering what this means for real-time data, where new information is being generated all the time, such as live traffic information. The answer here depends somewhat on the situation, but for open real-time data one would imagine a combination of bulk download access, and some way to get rapid or regular updates. For example, you might provide a stream of the latest updates which is available all the time, and a bulk download of a complete day’s data every night.)

Licensing and the public domain

Generally, when we want to know whether a dataset is legally open, we check to see whether it is available under an open licence (or that it’s in the public domain by means of a “dedication”). However, it is important to note that it is not always clear whether there are any exclusive, intellectual-property-style rights in the data such as copyright or sui-generis database rights (for example, this may depend on your jurisdiction). You can read more about this complex issue in the Open Definition legal overview of rights in data. If there aren’t exclusive rights in the data, then it would automatically be in the public domain, and putting it online would be sufficient to make it open. However, since, this is an area where things are not very clear, it is generally recommended to apply an appropriate open license – that way if there are exclusive rights you’ve licensed them and if there aren’t any rights you’ve not done any harm (the data was already in the public domain!).

More about openness coming soon

In coming days we’ll post more on the theme of explaining openness, including the relationship of the Open Definition to specific sets of principles for openness – such as the Sunlight Foundation’s 10 principles and Tim Berners-Lee’s 5 star system, why having a shared and agreed definition of open data is so important, and how one can go about “doing open data”.

Defining Open Data

- October 3, 2013 in Featured, Open Data, Open Definition, Open Knowledge Definition

Open data is data that can be freely used, shared and built-on by anyone, anywhere, for any purpose. This is the summary of the full Open Definition which the Open Knowledge Foundation created in 2005 to provide both a succinct explanation and a detailed definition of open data. As the open data movement grows, and even more governments and organisations sign up to open data, it becomes ever more important that there is a clear and agreed definition for what “open data” means if we are to realise the full benefits of openness, and avoid the risks of creating incompatibility between projects and splintering the community. Open can apply to information from any source and about any topic. Anyone can release their data under an open licence for free use by and benefit to the public. Although we may think mostly about government and public sector bodies releasing public information such as budgets or maps, or researchers sharing their results data and publications, any organisation can open information (corporations, universities, NGOs, startups, charities, community groups and individuals).

Read more about different kinds of data in our one page introduction to open data

There is open information in transport, science, products, education, sustainability, maps, legislation, libraries, economics, culture, development, business, design, finance …. So the explanation of what open means applies to all of these information sources and types. Open may also apply both to data – big data and small data – or to content, like images, text and music! So here we set out clearly what open means, and why this agreed definition is vital for us to collaborate, share and scale as open data and open content grow and reach new communities.

What is Open?

The full Open Definition provides a precise definition of what open data is. There are 2 important elements to openness:
  • Legal openness: you must be allowed to get the data legally, to build on it, and to share it. Legal openness is usually provided by applying an appropriate (open) license which allows for free access to and reuse of the data, or by placing data into the public domain.
  • Technical openness: there should be no technical barriers to using that data. For example, providing data as printouts on paper (or as tables in PDF documents) makes the information extremely difficult to work with. So the Open Definition has various requirements for “technical openness,” such as requiring that data be machine readable and available in bulk.
There are a few key aspects of open which the Open Definition explains in detail. Open Data is useable by anyone, regardless of who they are, where they are, or what they want to do with the data; there must be no restriction on who can use it, and commercial use is fine too. Open data must be available in bulk (so it’s easy to work with) and it should be available free of charge, or at least at no more than a reasonable reproduction cost. The information should be digital, preferably available by downloading through the internet, and easily processed by a computer too (otherwise users can’t fully exploit the power of data – that it can be combined together to create new insights). Open Data must permit people to use it, re-use it, and redistribute it, including intermixing with other datasets and distributing the results. The Open Definition generally doesn’t allow conditions to be placed on how people can use Open Data, but it does permit a data provider to require that data users credit them in some appropriate way, make it clear if the data has been changed, or that any new datasets created using their data are also shared as open data. There are 3 important principles behind this definition of open, which are why Open Data is so powerful:
  • Availability and Access: that people can get the data
  • Re-use and Redistribution: that people can reuse and share the data
  • Universal Participation: that anyone can use the data

Governance of the Open Definition

Since 2007, the Open Definition has been governed by an Advisory Council. This is the group formally responsible for maintaining and developing the Definition and associated material. Its mission is to take forward Open Definition work for the general benefit of the open knowledge community, and it has specific responsibility for deciding on what licences comply with the Open Definition. The Council is a community-run body. New members of the Council can be appointed at any time by agreement of the existing members of the Advisory Council, and are selected for demonstrated knowledge and competence in the areas of work of the Council. The Advisory Council operates in the open and anyone can join the mailing list.

About the Open Definition

The Open Definition was created in 2005 by the Open Knowledge Foundation with input from many people. The Definition was based directly on the Open Source Definition from the Open Source Initiative and we were able to reuse most of these well-established principles and practices that the free and open source community had developed for software, and apply them to data and content. Thanks to the efforts of many translators in the community, the Open Definition is available in 30+ languages.

More about openness coming soon

In coming days we’ll post more on the theme of explaining openness, including a more detailed exploration of the Open Definition, the relationship of the Open Definition to specific sets of principles for openness – such as the Sunlight Foundation’s 10 principles and Tim Berners-Lee’s 5 star system, why having a shared and agreed definition of open data is so important, and how one can go about “doing open data”.

Open Data Training at the Open Knowledge Foundation

- September 26, 2013 in Business, ckan, Featured, OKF, Open Data, Open Government Data, Our Work, School of Data, Technical, training

We’re delighted to announce today the launch of a new portfolio of open data training programs. For many years the Open Knowledge Foundation has been working — both formally and informally — with governments, civil society organisations and others to provide this kind of advice and training. Today marks the first time we’ve brought it all together in one place with a clear structure. These training programs are designed for two main groups of people interested in open data:
  1. Those within government and other organisations seeking a short introduction to open data – what it is, why to “do” open data, what the challenges are, and how to get started with an open data project or policy.

  2. The growing group of those specialising in open data, perhaps as policy experts, open data program managers, technology specialists, and so on, generally within government or other organisations. Here we offer more in-depth training including detailed material on how to run an open data program or project, and also a technical course for those deploying or maintaining open data portals.

Our training programs are designed and delivered by our team of open data experts with many years of experience creating, maintaining and supporting open data projects around the world. Please contact us for details on any of the these courses, or if you’d be interested in discussing a custom program tailored to your needs.

Our Open Data Training Programs

Open Data Introduction

Who is this for?

This course is a short introduction to open data for anyone and is perfectly suited to teams from diverse functions across organisations who are thinking about or adopting open data for the first time.

Topics covered

Everything you need to understand and start working in this exciting new area: what is open data, why should institutions open data, what are the benefits and opportunities to doing so, and of course how you can get started with an open data policy or project. This is a one day course to help you and your team get started with open data. Photo by Victor1558

Administrative Open Data Management

Who is this for?

Those specialising in open data, whether as policy experts, open data program managers and similar roles in government, civil service, and other organisations. This course is specifically for non-technical staff who are responsible for managing Open Data programs in their organisation. Such activities typically include implementing an Open Data strategy, designing/launching an Open Data portal, coordinating publication processes, preparing data for publication, and fostering data re-use.

Topics covered

Basics of Open Data (legal, managerial, technical); Success factors for the design and execution of an Open Data program; Overview of the technology landscape; Success factors for community re-use.

Open Data Portal Technology

Who is this for?

Those specializing in open data, whether as software or data experts, and open data delivery managers and similar roles in government, civil service, and other organisations. Technical staff who are responsible for maintaining or running an enterprise Open Data portal. Such activities typically include deployment, system administration and hosting, site theming, development of custom extensions and applications, ETL procedures, data conversions, data life-cycle management.

Topics covered

Basics of Open Data, publication process, and technology landscape; architecture and core functionality of a modern Open Data Management System (CKAN used as example). Deployment, administration and customisation; deploying extensions; integration; geospatial and other special capabilities; engaging with the CKAN community. Photo by Victor1558

Custom training

We can offer training programs tailored to your specific needs, for your organisation, data domain, or locale. Get in touch today to discuss your requirements!

Working with data

We also run the School of Data, which helps civil society organisations, journalists and citizens learn the skills they need to use data effectively, through both online and in-person “learning through doing” workshops. The School of Data runs data-driven investigations and explorations, and data clinics and workshops from “What is Data” up to advanced visualisation and data handling. As well as general training and materials, we offer topic-specific and custom courses and workshops. Please contact schoolofdata@okfn.org to find out more. As with all of our work, all relevant materials will be openly licensed, and we encourage others (in the global Open Knowledge Foundation network and beyond) to use and build on them.

The Global Open Knowledge Foundation Network

- September 24, 2013 in Featured, Open Knowledge Foundation Local Groups

Since 2004, the Open Knowledge Foundation has been connecting people and building communities in open data and open knowledge around the world. People in the global Open Knowledge Foundation network run meetups and workshops, campaign for open data, train, advise, and create open source tools and materials to help everyone work with data. The network has grown rapidly and is now present in 40 countries and new local groups are starting on an almost weekly basis. Anyone can join the network – both organisations and individuals, and whatever your interest in open knowledge and open data! Local groups within the Open Knowledge Foundation network have developed their own projects, communities and funding – some, such as Open Knowledge Foundation Germany, now have a very significant level of activity with their own staff and projects. Check our map to see if there’s already an Open Knowledge Foundation presence in your area where you could get involved. If there’s not a group already, why not start one? Each local group within the network is independent and has a local focus but, at the same time, is part of the global community-run network and benefits from the support, sharing and collaboration within that wider network. Some recent highlights from around the network include:
  • The Brazilian Open Knowledge Foundation group organised an open science event in São Paulo, with over 60 people participating in round tables covering the many aspects of openness in science: education, publications, tools, data, citizenry and research.
  • Japan has 19 cities with Where Does My Money Go sites only a year after the first site was set up in Yokohama, and enthusiastic engineers are forming a community of practice to share know-how and get more cities on board.
  • The Greece Chapter of the Open Knowledge Foundation has done development work on the Greek Open Data portal, and released the first version of want2know, a platform which lets citizens request data they want open access to, motivated by the Open Data Census.
  • The Spanish Chapter of the Open Knowledge Foundation organised the first Conference of Data Journalism and Open Data in Spain, titled “When data tells stories”
  • The Ambassador for Morocco was invited on national television to discuss the Moroccan e-gov project with the Minister of Trade, Industry, and New Technologies; they talked about open data, the CKAN open data management system, and the Moroccan Open Data Portal, and as a result the Ambassador was subsequently invited to work with the government to help improve the national portal.

Joining the Network

Over the last year or so we’ve been bringing in some greater structure to our international network to support its growth and make it easier to join. There’s a way to get involved for everyone:

Chapters

Open Knowledge Foundation Chapters are autonomous and independent non-profit organisations, and are leaders working on open data and open knowledge in their countries. Chapters share their expertise and learning on the ground and with other local groups to ensure they thrive, are sustainable, and can have the greatest impact with their work. If you’re already part of a non-profit organisation working on openness, or a local group looking to incorporate, then get in touch to explore what being a Chapter would involve.

Local Initiatives

Local initiatives are groups working together on open advocacy, campaigning and projects of all kinds in a local context and connected with others around the world through the network. If you’re part of an existing group working on openness, or you’ve met others in your region who would like to do more with open knowledge, you can apply to become an Open Knowledge Foundation Local Initiative.

Ambassadors

Ambassadors are community leaders working to bring together the Open Knowledge community in their area and make a real difference with open information. If you’re an individual looking to start open activity in a country or region where the Open Knowledge Foundation does not currently have an established presence, become an Open Knowledge Foundation Ambassador. We welcome multiple Ambassadors per region too. OKF network

Working Groups

As well as local groups, the network includes working groups which focus on specific areas of open data and open knowledge, enabling people with similar interests to gather to discuss, lobby, code, write, promote and explore particular areas of openness.

Creators and Makers

Because we love to make things as well as advocating for openness, there are many concrete projects and activities around the network where you can design, code, and write, including the Open Knowledge Foundation Labs and many other projects about open stuff. Get in touch with our local groups team on local@okfn.org, or join the okfn-discuss mailing list.

Open Data Privacy

- August 27, 2013 in Featured, Ideas and musings, Open Data, Open Data and My Data, Open Government Data, privacy

“yes, the government should open other people’s data”
Traditionally, the Open Knowledge Foundation has worked to open non-personal data – things like publicly-funded research papers, government spending data, and so on. Where individual data was a part of some shared dataset, such as a census, great amounts of thought and effort had gone in to ensuring that individual privacy was protected and that the aggregate data released was a shared, communal asset. But times change. Increasing amounts of data are collected by governments and corporations, vast quantities of it about individuals (whether or not they realise that it is happening). The risks to privacy through data collection and sharing are probably greater than they have ever been. Data analytics – whether of “big “ or “small” data – has the potential to provide unprecedented insight; however some of that insight may be at the cost of personal privacy, as separate datasets are connected/correlated. Medical data loss dress Both open data and big data are hot topics right now, and at such times it is tempting for organisations to get involved in such topics without necessarily thinking through all the issues. The intersection of big data and open data is somewhat worrying, as the temptation to combine the economic benefits of open data with the current growth potential of big data may lead to privacy concerns being disregarded. Privacy International are right to draw attention to this in their recent article on data for development, but of course other domains are affected too. Today, we’d like to suggest some terms to help the growing discussion about open data and privacy. Our Data is data with no personal element, and a clear sense of shared ownership. Some examples would be where the buses run in my city, what the government decides to spend my tax money on, how the national census is structured and the aggregate data resulting from it. At the Open Knowledge Foundation, our default position is that our data should be open data – it is a shared asset we can and should all benefit from. My Data is information about me personally, where I am identified in some way, regardless of who collects it. It should not be made open or public by others without my direct permission – but it should be “open” to me (I should have access to data about me in a useable form, and the right to share it myself, however I wish if I choose to do so). Transformed Data is information about individuals, where some effort has been made to anonymise or aggregate the data to remove individually identified elements. big-data_conew1 We propose that there should be some clear steps which need to be followed to confirm whether transformed data can be published openly as our data. A set of privacy principles for open data, setting out considerations that need to be made, would be a good start. These might include things like consulting key stakeholders including representatives of whatever group(s) the data is about and data privacy experts around how the data is transformed. For some datasets, it may not prove impossible to transform them sufficiently such that a reasonable level of privacy can be maintained for citizens; these datasets simply should not be opened up. For others, it may be that further work on transformation is needed to achieve an acceptable standard of privacy before the data is fit to be released openly. Ensuring the risks are considered and managed before data release is essential. If the transformations provide sufficient privacy for the individuals concerned, and the principles have been adhered to, the data can be released as open data. We note that some of “our data” will have personal elements. For instance, members of parliament have made a positive choice to enter the public sphere, and some information about them is therefore necessarily available to citizens. Data of this type should still be considered against the principles of open data privacy we propose before publication, although the standards compared against may be different given the public interest. This is part of a series of posts exploring the areas of open data and privacy, which we feel is a very important issue. If you are interested in these matters, or would like to help develop privacy principles for open data, join the working group mailing list. We’d welcome suggestions and thoughts on the mailing list or in the comments below, or talk to us and the Open Rights Group, who we are working with, at the Open Knowledge Conference and other events this autumn.

Shakespeare review: analysis

- May 15, 2013 in Access to Information, News, Open Data, Open Government Data

We welcome the Shakespeare review as a time to reflect, coming as it does at a time of great growth in open data in government and the public sector. The UK has lead the way with government taking a pioneering stance on open data policy in recent years, and this report sets out key recommendations for how to best take forward this work. It is particularly good to see acknowledgement that there is a “difference between a commitment to transparency and a true National Data Strategy for economic growth” as it is clear that many of the benefits of open public sector information will go beyond the economic. As the Open Knowledge Foundation has long emphasized:
The best thing to be done with your data will be thought of by someone else
Shakespeare recognises this with the comment that “we cannot always predict where the greatest value lies but know there are huge opportunities across the whole spectrum of PSI.” Getting more data released quickly, without agonising over quality concerns, is an excellent recommendation and we look forward to seeing this in practice. Alongside this we welcome the demand for high quality information in the National Core Reference Data plan, including key entity data; such reference data, following clear open standards, will transform what can be done with UK data. The request that Trading Funds should remove restrictive PSI licensing and work towards releasing all raw data for use and reuse is particularly warmly welcomed. We are pleased to see consideration being given to privacy and confidentiality issues; our definition of open data has always excluded personally-identifiable information, but with greater data collection than ever before, we acknowledge the challenges this can bring for data publishers. The demand for realistic and pragmatic consideration of privacy and confidentiality is welcomed, and best practice guidelines will be very helpful in assisting data publishers here. In addition we hope to see key security and privacy sector experts engaged in this as there are tough technical challenges around anonymisation, aggregation and sandbox use, and deep technical understanding is needed to fully appreciate the risks and limits of such systems, and to create sensible guidelines. We are also delighted to see open access mentioned in the report; open access to publicly-funded research data and papers has been a long-standing tenet of the Open Knowledge Foundation’s work. Shakespeare notes that “even today, access to academic research that has been paid for by the public is deliberately denied to the public, and to many researchers, by commercial publishers, aided by university lethargy, and government reluctance to apply penalties; thereby obstructing scientific progress.” We can, and must, do better here. We applaud the call for more data scientists and greater statistical skills at all levels; stronger data awareness and skills are critical for all the benefits of open data to be realised. In particular, the recognition that interactive and workshop methods can be most effective at teaching data skills is well aligned with our own School of Data and long standing culture of hackathons and developer engagement. The more teaching and training around data, alongside other key STEM areas including maths and technology, the better. Finally, it is great to see that the economic value of open data will be assessed through research and audit, but at the same time it is vital to be realistic about the timescales for significant change and impact in this field. We think on a timescale of decades to see the full benefits and effects of the new open approaches to creation, sharing and reuse of knowledge, and government and others must be realistic about what will be achieved and how quickly, to avoid disappointment. Open data is valuable to us socially and culturally as well as commercially, but it is only one part of a solution, and we need to work on the other key elements, including institutional change, tools, skills and awareness, which are also necessary conditions to realise the full benefits of openness. These other elements may be harder, and more expensive, than the release of data – we should still release more open data, and we are glad to see this report affirming this and encouraging data skills alongside – but the journey is far from over. As Shakespeare puts it:
“It is now time to build on the very positive start we have made on open data with a more directed, more predictable engineering of usable information. Obstacles must be cleared, structures defined, and progress audited, so that we have a purposeful, progressive strategy that we can trust to deliver the full benefits to the nation.”
If you’re interested in open data and you’d like to join our global community of open government data advocates, you can join our open-government mailing list:

Open Knowledge: much more than open data

- May 1, 2013 in Featured, Ideas and musings, Join us, OKF, Open Data, Our Work

Book, Ball and Chain We’ve often used “open knowledge” simply as a broad term to cover any kind of open data or content from statistics to sonnets, and more. However, there is another deeper, and far more important, reason why we are the “Open Knowledge” Foundation and not, for example, the “Open Data” Foundation. It’s because knowledge is something much more than data. Open knowledge is what open data becomes when it’s useful, usable and used. At the Open Knowledge Foundation we believe in open knowledge: not just that data is open and can be freely used, but that it is made useful – accessible, understandable, meaningful, and able to help someone solve a real problem. —Open knowledge should be empowering – it should be enabling citizens and organizations understand the world, create insight and effect positive change. It’s because open knowledge is much more than just raw data that we work both to have raw data and information opened up (by advocating and campaigning) and also by making, creating the tools to turn that raw material into knowledge that people can act upon. For example, we build technical tools, open source software to help people work with data, and we create handbooks which help people acquire the skills they need to do so. This combination, that we are both evangelists and makers, is extremely powerful in helping us change the world. Achieving our vision of a world transformed through open knowledge, a world where a vibrant open knowledge commons empowers citizens and enables fair and sustainable societies, is a big challenge. We firmly believe it can done, with a global network of amazing people and organisations fighting for openness and making tools and more to support the open knowledge ecosystem, although it’s going to take a while! We at the Open Knowledge Foundation are committed to this vision of a global movement building an open knowledge ecosystem, and we are here for the long term. We’d love you to join us in improving the world through open knowledge; there will be many different ways you can help coming up during the months ahead, so get started now by keeping in touch – by signing up to receive our Newsletter, or finding a local group or meetup near you.

Open Data & My Data

- February 22, 2013 in Featured, Ideas and musings, Open Data, Working Groups

The Open Knowledge Foundation believes in open knowledge: not just that some data is open and freely usable, but that it is useful – accessible, understandable, meaningful, and able to help someone solve a real problem. A lot of the data which could help me improve my life is data about me – “MyData” if you like. Many of the most interesting questions and problems we have involve personal data of some kind. This data might be gathered directly by me (using my own equipment or commercial services), or it could be harvested by corporations from what I do online, or assembled by public sector services I use, or voluntarily contributed to scientific and other research studies. Tape library, CERN, Geneva 2

Image: “Tape library, CERN, Geneva 2″ by Cory Doctorow, CC-BY-SA.

This data isn’t just interesting in the context of our daily lives: it bears on many global challenges in the 21st century, such as supporting an aging population, food consumption and energy use. Today, we rarely have access to these types of data, let alone the ability to reuse and share it, even when it’s my data, about just me. Who owns data about me, who controls it, who has access to it? Can I see data about me, can I get a copy of it in a form I could reuse or share, can I get value out of it? Would I even be allowed to publish openly some of the data about me, if I wanted to? But how does this relate to open data? After all, a key tenet of our work at the Open Knowledge Foundation is that personal data should not be made open (for obvious privacy reasons)! However there are, in fact, obvious points where “Open Data” and “My Data” connect:
  • MyData becomes Open Data (via transformation): Important datasets that are (or could be) open come from “my data” via aggregation, anonymisation and so on. Much statistical information ultimately comes from surveys of individuals, but the end results are heavily aggregated (for example, census data). This means “my data” is an important source but also that it is essential that the open data community have a good appreciation of the pitfalls and dangers here – e.g. when anonymisation or aggregation may fail to provide appropriate privacy.

  • MyData becomes Open Data (by individual choice): There may be people who want to share their individual, personal, data openly to benefit others. A cancer patient could be happy to share their medical information if that could assist with research into treatments and help others like them. Alternatively, perhaps I’m happy to open my household energy data and share it with my local community to enable us collectively to make sustainable energy choices. (Today, I can probably only see this data on the energy company’s website, remote, unhelpful, out of my control. I may not even be able to find out what I’m permitted to do with my data!)

  • The Right to Choose: if it’s my data, just about me, I should be able to choose to access it, reuse it, share it and open it if I wish. There is an obvious translation here of key Open Data principles to MyData. Where the Open Definition states that material should be freely available for use, reuse and redistribution by anyone, we could think that my data should freely available for use, reuse and redistribution by me.

We think it is important to explore and develop these connections and issues. The Open Knowledge Foundation is therefore today launching an Open Data & MyData Working Group. Sign up here to participate:

This will be a place to discuss and explore how open data and personal data intersect. How can principles around openness inform approaches to personal data? What issues of privacy and anonymisation do we need to consider for datasets which may become openly published? Do we need “MyData Principles” that include the right of the individual to use, reuse and redistribute data about themselves if they so wish?

Appendix

There are plenty of challenging issues and questions around this topic. Here are a few:

Anonymization

Are big datasets actually anonymous? Anonymisation is incredibly hard. This isn’t a new problem (Ars Technica had a great overview in 2009) although it gets more challenging as more data is available, openly or otherwise, as more data which can be cross-correlated means anonymisation is more easily breached.

Releasing Value

There’s a lot of value in personal data – Boston Consulting Group claim €1tn. But even BCG point out that this value can only be realised if the processes around personal data are more transparent. Perhaps we can aspire to more than transparency, and have some degree of personal control, too.

Governments

Governments are starting to offer some proposals here such as “MiData” in the UK. This is a good start but do they really serve the citizen? There’s also some proposed legislation to drive companies to give consumers the right to see their data. But is access enough? The consumer doesn’t own their data (even when they have “MiData”-style access to it), so can they publish it under an open licence if they wish?

Whose data is it anyway?

Computers, phones, energy monitors in my home, and so on, aren’t all personal to me. They are used by friends and family. It’s hard to know whose data is involved in many cases. I might want privacy from others in my household, not just from anonymous corporations. This gets even more complicated when we consider the public sphere – surveillance cameras and internet of things sensors are gathering data in public places, about groups of independent people. Can the people whose images or information are being captured access or control or share this data, and how can they collaborate on this? How can consent be secured in these situations? Do we have to accept that some information simply cannot be private in a networked world? (Some of these issues were raised at the Open Internet of Things Assembly in 2012, which lead to a draft declaration. The declaration doesn’t indicate the breadth of complex issues around data creation and processing which were hotly debated at the assembly.)

MyData Principles

We will need clear principles. Perhaps, just as the Open Definition has help clarify and shape the open data space, we need analogous “MyData” Principles which set out how personal data should be handled. These could include, for example:
  • That my data should be made available to me in machine-readable bulk form
  • That I should have right to use that data as I wish (including using, reusing and redistribution if I so wish).
  • That none of my data (where it contains personal information) should be made open without my full consent.