You are browsing the archive for Open Knowledge.

Open Knowledge Justice Programme challenges the use of algorithmic proctoring apps

- February 26, 2021 in Open Knowledge

Today we’re pleased to share more details of the Justice Programmes new strategic litigation project: challenging the (mis)use of remote proctoring software.  

What is remote proctoring?

Proctoring software uses a variety of techniques to ‘watch’ students as they take exams. These exam-invigilating software products claim to detect, and therefore prevent, cheating. Whether this software can actually do what it claims, or not, there is concern that they breach privacy, data and equality rights and that the negative impacts of their use on students are significant and serious. 

Case study: Bar Exams in the UK

In the UK, barristers are lawyers who specialise in courtroom advocacy. The Bar Professional Training Course (BPTC) is run by the professional regulatory body: the Bar Standards Board (BSB). In August 2020, because of COVID 19, the BPTC exams took place remotely, and used a proctoring app from US company Pearson Vue. Students taking exams had to allow their room to be scanned and an unknown, unseen exam invigilator to surveil them.  Students had to submit a scan of their face to verify their identity – and were prohibited from leaving their seat for the duration of the exam. That meant up to 5 hours (!) without a toilet break.  Some students had to relieve themselves in bottles and buckets under their desks whilst maintaining ‘eye contact’ with their faceless invigilator. Muslim women were forced to remove their hijabs – and at least one individual had to withdraw from sitting the exam rather than, as they felt it,  compromise their faith. The software had numerous errors in functionality, including suddenly freezing without warning and deleting text. One third of students were unable to complete their exam due to technical errors.

Our response

The student reports, alongside our insight into the potential harms caused by public impact algorithms, prompted us to take action. We were of the opinion that what students were subjected to breached data, privacy and other legal rights as follows: Data Protection and Privacy Rights
  • Unfair and opaque algorithms. The software used algorithmic decision-making in relation to the facial recognition and/or matching identification of students and behavioural analysis during the exams. The working of these algorithms was unknown and undisclosed.
  • The app’s privacy notices were inadequate. There was insufficient protection of the students’ personal data. For example, students were expressly required to confirm that they had ‘no right to privacy at your current location during the exam testing session’ and to ‘explicitly waive any and all claims asserting a right to individual privacy or other similar claims’. Students were asked to consent to these questions just moments before starting an extremely important exam and without being warned ahead of time.
  • The intrusion involved was disproportionate. The software required all students to carry out a ‘room scan’ (showing the remote proctor around their room). They were then surveilled by an unseen human proctor for the duration of the exam. Many students felt this was unsettling and intrusive.
  • Excessive data collection. The Pearson VUE privacy notice reserved a power of data collection of very broad classes of personal data, including biometric information, internet activity information (gleaned through cookies or otherwise), “inferences about preferences, characteristics, psychological trends, preferences, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes” and protected characteristics.
  • Inadequately limited purposes. Students were required to consent to them disclosing to third parties their personal data “in order to manage day to day business needs’, and to consent to the future use of “images of your IDs for the purpose of further developing, upgrading, and improving our applications and systems”.
  • Unlawful data retention. Pearson VUE’s privacy notice states in relation to data retention that “We will retain your Personal Data for as long as needed to provide our services and for such period of time as instructed by the test sponsor.”
  • Data security risks. Given the sensitivity of the data that was required from students in order to take the exam, high standards of data security are required. Pearson VUE gave no assurances regarding the use of encryption. Instead there was a disclaimer that “Information and Personal Data transmissions to this Site and emails sent to us may not be secure. Given the inherent operation and nature of the Internet, all Internet transmissions are done at the user’s own risk.”
  • Mandatory ‘opt-ins’. The consent sought from students was illusory, as it did not enable students to exert any control over the use of their personal data. If they did not tick all the boxes, they could not participate in the exam. Students could not give a valid consent to the invasion of privacy occasioned by online proctoring when their professional qualification depended on it. They were in effect coerced into surrendering their privacy rights. According to the GDPR, consent must be “freely given and not imposed as a condition of operation”.
  Equality Rights Public bodies in the UK have a legal duty to carefully consider the equalities impacts of the decisions they make. This means that a policy, project or scheme must not unlawfully discriminate against individuals on the basis of a ‘protected characteristic’: their race, religion or belief, disability, sex, gender reassignment, sexual orientation, age, marriage or civil partnership and/or pregnancy and maternity. In our letter to the BSB, we said that the BSB had breached their equality rights duties by using a software that featured facial recognition and/or matching processes, which are widely proven to discriminate against people with dark skin. The facial recognition process also required female students to remove their religious dress, therefore breaching the protections that are afforded to people to observe their religion. Female Muslim students were unable to select being observed by female proctors, despite the negative cultural significance of unknown male proctors viewing them in their homes. We also raised the fact that some people with disabilities or women who were pregnant were unfairly and excessively impacted by the absence of toilet breaks for the duration of the assessment. The use of novel and untested software, we said, had the potential to discriminate against older students with fewer IT skills.

The BSB’s Reply

After we wrote to express these concerns, the BSB:
  • stopped using remote proctoring apps, as was scheduled for the next round of bar exams
  • announced that an inquiry into their use of remote proctoring apps in August 2020 to produce an independent account of the facts, circumstances and reasons as to why things went wrong. The BSB invited us to make submissions to this inquiry, which we have done. You can read them here.

Next steps

Here at the Open Knowledge Justice Programme, we’re delighted that the BSB has paused the use of remote proctoring and keenly await the publication of the findings of the independent inquiry. However, we’ve been recently concerned to discover that the BSB has delegated decision-making authority for the use of remote proctoring apps to individual educational providers – e.g universities, law schools – and that many of these providers are scheduling exams using remote proctoring apps. We hope that the independent inquiry’s findings will conclusively determine that this must not continue.   Sign up to our mailing list or follow the Open Knowledge Justice Programme on Twitter to receive updates.      

[공지] 오픈데이터데이 2021

- February 23, 2021 in Featured, News, Open Knowledge, 이벤트

기록하지 않은 사실은 사라지고, 기억은 필요에 따라 다르게 남는다. 코로나-19는 우리에게 어떤 의미로 남을까? 코로나-19에 대한 1년의 기록, 데이터 분석 그리고 데이터 관점의 지속 가능성을 논의한다.
일정: 2021년 3월 6일 14:00 – 17:00 장소: 숙명여대 오픈스퀘어 D (온라인 생방송) 온라인 채널: 행사 소개페이지:

프로그램 소개 발표 1 : 코로나 19 공공데이터API 활용 가이드 2020년 1월 한국 첫 코로나19 확진자가 발생한 이후 국내에서도 방대한 양의 관련 공공데이터가 생성, 활용되고 있다. 매일 공개되는 일별 확진자 수 또한 이 데이터에서 비롯된 것이다. 하지만 공공데이터를 활용해 기존에 제공되고 있는 정보보다 더 의미 있는 분석을 만들 수는 없는지에 대한 문제는 여전히 남는다. 더불어 사람들의 기억은 휘발되기에, 이 시기 한국 상황은 어떠했는지 중립적인 시각에서 기록을 남기는 것도 중요하다. 본 발표는 국내 코로나19 감염, 확산의 시간에 따른 변화와 집단별 차이를 분석하고자 한다. 코로나19 공공데이터 API를 활용한 것이며, 이를 보여줌으로써 코로나19 공공데이터 활용방안도 소개할 것이다. 웹 상에서 어떤 데이터를 어떻게 얻을 수 있는지 알려주고, 가능한 분석과 시각화의 예를 보일 것이다. 발표 2: 숨겨진 정보를 데이터로! 집단 감염 데이터 분석 본 분석은 코로나 바이러스로 인한 업종별 집단 감염 수를 비교하기 위해 시행되었다. 집단 감염 관련 데이터가 활용가능한 형태로 제공되지 않아 질병관리청에서 제공하는 정례브리핑 자료에서 집단 감염 데이터를 분리하였다. 2020년 1월부터 2021년 2월까지의 집단 감염 사건과 발생 장소, 브리핑 일자 등을 수집하여 태블로 형태로 데이터화하고, 직종에 따른 집단 감염 수를 시각화했다. 그 결과 종교시설 및 단체, 요양병원 및 시설, 지인 및 소모임, 의료기관, 직장 순으로 높게 나타났다. 분석 과정에서 정부의 일관되지 못한 데이터 제공과 정보 재사용이 불가한 자료 제공에 대한 아쉬움이 있었으며 업종 분류를 위한 명확한 기준 제시가 곤란했다는 어려움이 있었다. 또한 향후 데이터의 지속적인 업데이트를 위한 효율적인 방법을 강구해야 한다. 발표 3: Link! 백신으로 보는 국가별 코로나 대응 전략 코로나 19는 범세계적인 문제이며, 데이터를 활용하여 현 상황을 극복하려는 노력 또한 특정 국가만의 노력으로 완수할 수 없다. 하지만 지난 1년, 국내에서만 수차례 이어진 대형 악재 속에서 한국 상황을 정리하기에도 버거워 팬데믹 상황을 넓게 보기란 요원했다. 때문에 코로나 19에 대응하는 해외의 사례는 개연성이나 상관성을 찾기 힘들었고, 미디어에서 전달하는 단편적인 정보들로 접할 수 밖에 없었다.  한국을 넘어 전세계의 코로나 19 확진 현황 및 인구통계학적 데이터를 바탕으로 분석을 시도했다. 다만 분석의 범위가 너무 방대하다고 판단하여 현 상황의 게임체인저가 될 백신을 중심으로 주제를 좁혀보았다. 이를 통해 각국의 백신 확보 전략과 한국의 상황을 비교하고, 나아가 글로벌 환경에서 범세계적 데이터 확보의 중요성을 알아보고자 한다. 발표 4: 코로나 확진자 동선 확인을 위한 데이터 수집과 분석 방법, 우리에게 남겨진 이슈 데이터 경제 시대에서 공공데이터 활용의 중요성은 더욱 부각되고 있다. 특히 코로나19 시대에서 개방된 공공데이터는 일반 시민, 의료 분야 종사자 및 국가 차원에서 의사결정 지원에 많은 기여를 하고 있다. 그렇지만 프라이버시 보호를 위한 개인정보 보호 또한 데이터의 개방 시 필수적으로 점검해야 하는 사항이다. 따라서 코로나19 관련 일부 데이터 중에는 공개 범위, 공개 대상 및 공개 시점 등 또한 제한적이거나 한시적인 경우가 발생하며 그 결과 데이터 분석 및 예측에 이르기까지의 처리 시간이 지연되거나 누락된 데이터로 인한 정보의 완전성이 깨질 수 있다. 그러므로 원시데이터의 생성에서부터 수집, 정제, 가공 및 분석에 이르는 과정이 기계적인 도구로 자동화 될 수 있어야 하며 데이터의 재사용성 및 재활용성을 보장할 수 있어야 한다. 이번 세미나에서는 코로나 19 관련 데이터 중에서 확진자 동선 정보 분석에 필요한 데이터들을 기반으로 데이터 분석 라이프사이클에 따라 처리하면서 발생하는 이슈들을 살펴보고, 나아가서는 이러한 공공데이터가 완전성, 적시성을 갖췄을 때 우리가 생산해 낼 수 있는 서비스들에 대한 예를 보이고자 한다.

What is a public impact algorithm?

- February 4, 2021 in Open Knowledge

Meg Foulkes discusses public impact algorithms and why they matter.

When I look at the picture of the guy, I just see a big Black guy. I don’t see a resemblance. I don’t think he looks like me at all.

This is what Robert Williams said to police when he was presented with the evidence upon which he was arrested for stealing watches in June 2020. Williams had been identified by an algorithm, when Detroit Police ran grainy security footage from the theft through a facial recognition system. Before questioning Williams, or checking for any alibi, he was arrested. It was not until the matter came to trial that Detroit Police admitted that he had been falsely, and solely, charged on the output of an algorithm. It’s correct to say that in many cases, when AI and algorithms go wrong, the impact is pretty innocuous – like when a music streaming service recommends music you don’t like. But often, AI and algorithms go wrong in ways that cause serious harm, as in the case of Robert Williams. Although he had done absolutely nothing wrong, he was deprived of a fundamental right on the basis of a computer output: his liberty. It’s not just on an individual scale that these harms are felt. Algorithms are written by humans, so they can reflect human biases. What algorithms can do is amplify, through automatedly entrenching the bias, this prejudice over a massive scale. The bias isn’t exclusively racialised; last year, an algorithm used to determine exam grades disproportionately downgraded disadvantaged students. Throughout the pandemic, universities have been turning to remote proctoring software that falsely identifies students with disabilities as cheats. For example, those who practice self-stimulatory behavior or ‘stimming’ may get algorithmically flagged again and again for suspicious behavior, or have to disclose sensitive medical information to avoid this. We identify these types of algorithms as ‘public impact algorithms’ to clearly name the intended target of our concern. There is a big difference between the harm caused by inaccurate music suggestions and algorithms that have the potential to deprive us of our fundamental rights. To call out these harms, we have to precisely define the problem. Only then can we  hold the deployers of public impact algorithms to account, and ultimately to achieve our mission of ensuring public impact algorithms do no harm. Sign up to our mailing list or follow the Open Knowledge Justice Programme on Twitter to receive updates.  

csv,conf,v6 is going ahead on May 4-5 2021

- January 14, 2021 in #CSVconf, Open Knowledge


Attendees of csv,conf,v4

Save the date for csv,conf,v6! The 6th version of csv,conf will be held online on May 4-5 2021. If you are passionate about data and its application to society, this is the conference for you. Submissions for session proposals for 25-minute talk slots, and a new ‘Birds of a Feather’ track, are open until February 28 2021, and we encourage talks about how you are using data in an interesting way. The conference will take place on Crowdcast, Slack, Spatial Chat and other platforms used for breakout and social spaces. We will be opening ticket sales soon.

Pictured are attendees to the csv,conf,v5 opening session.

csv,conf,v5 was planned to go ahead in May 2020 in person in Washington DC. Due to the Covid-19 situation this was not possible and the organising team made the decision to do the event online. The event was a huge success and most talks had well over 300 attendees. We have written about our experience of organising an online conference with the hope that it will help others ( and are excited to be building on this experience for this year. csv,conf is a much-loved community conference bringing together diverse groups to discuss data topics, featuring stories about data sharing and data analysis from science, journalism, government, and open source. Over two days, attendees will have the opportunity to hear about ongoing work, share skills, exchange ideas and kickstart collaborations. As in previous years, the Open Knowledge Foundation are members of the organising team.

Expect the return of the comma llama!

First launched in July 2014, csv,conf has expanded to bring together over 2,000 participants from 30 countries with backgrounds from varied disciplines. If you’ve missed the earlier years’ conferences, you can watch previous talks on topics like data ethics, open source technology, data journalism, open internet, and open science on our YouTube channel. We hope you will join us  in May to share your own data stories and join the csv,conf community! We are happy to answer all questions you may have or offer any clarifications if needed. Feel free to reach out to us on, on Twitter @CSVConference or our dedicated community Slack channel We are committed to diversity and inclusion, and strive to be a supportive and welcoming environment to all attendees. To this end, we encourage you to read the csv,conf Code of Conduct. csv,conf,v6 is a community conference that is in part supported by the Sloan Foundation through our Frictionless Data for Reproducible Research work. The Frictionless Data team is part of the organising group.

Join us in Supporting the Open Data Community in 37 Countries around the World

- December 16, 2020 in Open Knowledge

Join us in Supporting the Open Data Community in over 37 Countries around the World Every year, the team at Open Knowledge Foundation works with the global open data community to deliver Open Data Day Open Data Day is an annual celebration. Groups around the world create local events to learn about open data, and to encourage the adoption of open data policies in government, business and civil society. The next Open Data Day is Saturday March 6th 2021.  Last year Open Knowledge Foundation and our Partners provided 67 mini-grants of US$300 to support events in 37 different countries ! 90% went to countries in the Global South. You can read more about who we supported in our blog post Open Data Day 2020. It’s a wrap! Or check out our Executive Summary Report on Open Data Day 2020
>> if your organisation wants to become a Partner for Open Data Day 2021, check out the Open Data Day 2021 Partnership prospectus. >> if you want to make a financial donation to support Open Data Day 2021, donate here. Put the words ‘Open Data Day’ in the ‘Write a Note (optional)’ field. $300/£225/€250 donation supports one mini-grant. $50/£40/€45 donation supports one photography prize.

We are so grateful to our Open Data Day 2020 partners and want to thank them on behalf of everyone at Open Data Day. Without their generous support, many of the events that make Open Data Day so impactful would not be possible.
>> if your organisation wants to become a Partner for Open Data Day 2021, check out the Open Data Day 2021 Partnership prospectus. >> if you want to make a financial donation to support Open Data Day 2021, donate here. Put the words ‘Open Data Day’ in the ‘Write a Note (optional)’ field. $300/£225/€250 donation supports one mini-grant. $50/£40/€45 donation supports one photography prize.


Thank you for your feedback about Open Data Day. Here’s what we learned.

- November 17, 2020 in Open Knowledge

Open Data Day is an annual celebration of open data all over the world facilitated by the Open Knowledge Foundation. Each year, groups from around the world create local events on the day where they will use open data in their communities. It is an opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business and civil society. With Open Data Day 2021 less than four months away, we asked the Open Data community to tell us how you think we can better support Open Data Day. It’s not too late to have your say. Just visit the survey here.   The responses we had were very encouraging. We received lots of feedback – both positive and negative. And many of you offered to help with Open Data Day 2021. Thank you so much! We’ve gone through all the feedback and… Here is a summary of what we learned You told us that Open Data Day 2021 will be better if we …
  • present all the events together in a searchable directory, to show the amazing scale and variety of Open Data Day events
  • focus less on the geographic location of the events because some events are online and can be attended by anyone with an internet connection
  • give more mini-grants to more Open Data Day events 
  • confirm who has won the mini-grants at an earlier date – to help with event planning
  • focus on one data track – not four. Recommendations included climate change data, disaster risk management data, gender data, election data and Covid 19 data
  • give support, advice and opportunities for Covid Safe events and activities 
  • get better press coverage of Open Data Day events, and better connections with data journalists 
  • publish reports on Open Data Day events on the Open Data Day website, with more photos and videos
  • improve the mini-grant methodology to increase the measurable impact of Open Data Day mini-grants 
  • reduce bank changes by using innovative money transfer systems 
  • helped funding partners create better connections with event organisers and attendees
Here at Open Knowledge Foundation we will spend the next few weeks digesting all these great ideas and working out how best to respond to make sure Open Data Day 2021 is better than ever. Thanks again to everyone who already responded to our survey! 

Open Knowledge and MyData – same roots, shared values

- November 10, 2020 in Events, OK Finland, Open Data, Open Knowledge, personal-data, Talks

  The origins of MyData can be traced back to Open Knowledge Festival held in Finland in 2012. There, a small group of people gathered in a breakout session to discuss what ought to be done with the kind of data that cannot be made publicly available and entirely open, namely personal data. Over the years, more and more people who had similar ideas about personal data converged and found each other around the globe. Finally, in 2016, a conference entitled MyData brought together thinkers and doers who shared a vision of a human-centric paradigm for personal data and the community became aware of itself. The MyData movement, which has since gathered momentum and grown into an international community of hundreds of people and organisations, shares many of its most fundamental values with the Open movement from which it has spun off. Openness and transparency in collection, processing, and use of personal data; ethical and socially beneficial use of data; cross-sectoral collaboration; and democratic values are all legacies of the open roots of MyData and hard-wired into the movement itself. The MyData movement was sustained originally through annual conferences held in Helsinki and attended by data professionals in their hundreds. These were made possible by the support of the Finnish chapter of Open Knowledge, who acted as their main organiser. As the years passed and the movement matured, in the autumn of 2018, the movement formalised into its own organisation, MyData Global. Headquartered in Finland, the organisation’s international staff of six, led by General Manager Teemu Ropponen, now facilitate the growing community with local hubs in over 20 locations on six continents, a fourth Helsinki-based conference in September 2019, and the continued efforts of the movement to bring about positive change in the way personal data is used globally. Join MyData 2019 conference with a special discount code! If you want to learn more about MyData, join the MyData 2019 conference on 25-27 September 2019. As we love making friends, we would like to offer you a discount code of 10% for Business and Discounted ticket. Use MyDataFriend and claim your ticket now on

Tell us how you think we can better support Open Data Day

- November 4, 2020 in community, Events, Open Data Day, Open Data Day 2021, Open Knowledge

This year has been such an eventful year for all of us. As 2020 nears its end, here at Open Knowledge Foundation we are starting to think about Open Data Day 2021. I checked my calendar this morning – and it’s only 4 months away ! Open Data Day is such a great opportunity for the entire open data community to come together to show the benefits of open data. Last year over 300 Open Data Day events took place over 50 countries. Our (OKF’s) role is facilitation As we start to make plans, we would like you to have your say on how we (OKF) can best support Open Data Day. Do you have any ideas? Or comments? Advice? Tell us how you think we can better support Open Data Day  We want to know 
  • The good stuff. What worked at Open Data Day last year? What did you enjoy most? Which events really stood out? Did you meet someone at Open Data Day 2020 that changed the way you work for the better?
  • The bad stuff. What didn’t work last year? What could we have done differently? How would you like Open Data Day to improve? How can we achieve more impact? Are there other data tracks we should focus on?
  • How can we help each other? Open Data Day brings people together from around the world to celebrate open data. Are you interested in volunteering to help the global event happen? We are thinking of running a live online event and maybe some global competitions. And perhaps doing some fundraising for the whole open data community. Do you want to help ?
We’d love it if you can take 3 minutes to share your thoughts in our survey and tell us how you think we can improve Open Data Day.   We want to make Open Data Day 2021 better than ever, and we can only do that with your help !

Do we trust the plane or the pilot? The problem with ‘trustworthy’ AI

- October 19, 2020 in Open Knowledge

On April 8th 2019, the High-Level Expert Group on AI, a committee set up by the European Commission, presented the Ethics Guidelines for Trustworthy Artificial Intelligence. It defines trustworthy AI through three principles and seven key requirements. Such AI should be: lawful, ethical and robust, and take into account the following principles:
  • Human agency and oversight
  • Technical Robustness and safety
  • Privacy and data governance
  • Transparency
  • Diversity, non-discrimination and fairness
  • Societal and environmental well-being
  • Accountability
The concept has inspired other actors such as the Mozilla Foundation which has built on the concept and wrote a white paper clarifying its vision. Both the ethics guidelines and Mozilla’s white paper are valuable efforts in the fight for a better approach to what we at Open Knowledge call Public Impact Algorithms:
“Public Impact Algorithms are algorithms which are used in a context where they have the potential for causing harm to individuals or communities due to technical and/or non-technical issues in their implementation. Potential harmful outcomes include the reinforcement of systemic discrimination (such as structural racism or sexism), the introduction of bias at scale in public services or the infringement of fundamental rights (such as the right to dignity) »

The problem does not lie in the definition of trustworthiness: the ethical principles and key requirements are sound and comprehensive. Instead, it arises from the aggregation behind a single label of concepts whose implementation presents extremely different challenges. Going back to the seven principles outlined above, two dimensions are mixed in: the technical performance of the AI and the effectiveness of the oversight and accountability ecosystem which surrounds it. The principles fall overwhelmingly under the Oversight and Accountability category.
Technical performance Oversight and Accountability
Technical robustness and safety
Human agency and oversight
Privacy and data governance
Diversity, non-discrimination and fairness
Societal and environmental well-being
Talking about ‘trustworthy AI’ emphasizes the tool while de-emphasizing the accountability ecosystem, which becomes a bullet point; all but ensuring that it will not be given the attention it deserves.

Building a trustworthy plane

The reason why no one uses the expression ’trustworthy’ plane(1) or car (2) is not because trust is not essential to the aviation or automotive industries. It’s because trust is not a useful concept for legislative or technical discussions. Instead, more operational terms such as safety, compliance or suitability are used. Trust exists in the discourse around these industries, but is instead placed in the ecosystem of practices, regulations and actors which drive the industry: for the civil aviation industry this includes the quality of pilot training, the oversight on airplane design, or the standard of safety written in the legislation (3). The concept of ‘trustworthy AI’ displaces the trust from the ecosystem to the tool. This has several potential consequences:
  • Trust could become embedded in the discourse and legislation on the issue, pushing to the side other concepts that are more operational (safety, privacy, explicability) or essential (power, agency(4)).
  • Trustworthy AI could become an all encompassing label —akin to an organic fruit label— which would legitimize AI-enabled tools, cutting off discussions about the suitability of the tool for specific contexts or questions about whether these tools should be deployed at all. Why do the hard work of building accountable processes when a label can be used as a shortcut?
  • Minorities and disenfranchised groups would again be left out of the conversation: the trust that a public official puts into an AI tool will be extended by default to their constituents.
This scenario can already be seen in the European Commission’s white paper on AI(5): their vision occults completely the idea that some AI applications may not be desirable; they outline an ecosystem made of labels, risk levels(6) and testing centers, which would presumably give a definitive assessment on AI tools before their deployment; they use the concept of ’trust’ as a tool for accelerating the development of AI rather than as a way to examine the technology on its merits. Trust as the oil in the AI industry’s engine.

We should not trust AI

Behind Open Knowledge’s Open AI and Algorithms programme is the core belief that we can’t and shouldn’t trust Public Impact Algorithms by default. Instead, we need to build an ecosystem of regulation, practices and actors in which we can place our trust. The principles behind this ecosystem will resonate with the definition given above of ’trustworthy’ AI: human agency and oversight, privacy, transparency, accountability… But while a team of computer science researchers may discover a breakthrough in explainable deep learning, the work needed to set up and maintain this ecosystem will not come through a breakthrough: it will be a years-long, multi-stakeholder driven and cross-sector effort that will face its share of opponents and headwinds. This work can not, and should not, simply be a bullet point under a meaningless label. Concretely, this ecosystem would emphasize:
  • Meaningful transparency: at the design level (explainable statistical model vs black box algorithms)(7), before deployment (clarifying goals, indicators, risks and remediations)(8) and during the tool’s lifecycle (open performance data, audit reports)
  • Mandatory auditing: although algorithms deployed in public services should be open source, Intellectual Property Laws dictate that some of them will not. The second best option should consequently be to mandate auditing by regulators (who would have access to source code) and external auditors using API designed to monitor key indicators (some of them mandated by law, others defined with stakeholders)(9).
  • Clear redress and accountability processes: multiple actors intervene between the design and the deployment of an AI-enabled tool. Who is accountable for what will have to be clarified.
  • Stakeholder engagement: algorithms used in public services should be proactively discussed with the people they will affect, and the possibility of not deploying the tool should be on the table
  • Privacy by design: the implementation of algorithms in the public sector often leads to more data centralisation and sharing, with little oversight or even impact assessment.
These and other aspects of this ecosystem will be refined and extended as the public debate continues. But we need to make sure that the ethical debates and the ecosystem issue are not sidelined by an all-encompassing label which will hide the complexity and nuance of the issue. An algorithm may well be trustworthy in a certain context (clean structured data, stable variables, competent administrators, suitable assumptions) while being harmful in others, however similar they might be.

(1) The aviation industry talks about ‘airworthiness’ which is technical jargon for safety and legal compliance
(2) The automotive industry mainly talks about safety
(3) which is why federal aviation agencies (FAA) generally do not re-certify a plane validated by the USA’s FAA: they trust their oversight. The Boeing scandal led to a breach of trust and certification agencies around the world asked to re-certify the plane themselves.
(4) I purposefully did not mention fairness here. See this paper discussing the problems with using fairness in the AI debate:
(5) It was published on February 2020, which means that they already had access to the draft version of the Ethics Guidelines for Trustworthy AI
(6) See also the report from Data Ethics Commission of the Government which defines 5 risk levels
(7) Too little scrutiny is put on the relative performance of black box algorithms vs explainable statistical models. This paper discusses this issue:
(8) As of October 2020, Amsterdam (The Netherlands), Helsinki (Finland) and Nantes (France) are the only governments having deployed algorithm registers. But in all cases, the algorithms were deployed before being publicized.
(9) oversight through investigation will still be needed. Algorithm Watch has several projects in that direction, including a report on Instagram. This kind of work relies on volunteers sharing data about their social media feeds. Mozilla is also involved in helping them structure this kind of ‘data donation’ project

The UK’s National Data Strategy: first impressions and observations

- September 10, 2020 in Open Data, Open Government Data, Open Knowledge, Policy, training

National Data Strategy
After years of promises, the UK Government has finally announced the launch of a ‘framework’ National Data Strategy. The aim of the strategy is to “drive the UK in building a world-leading data economy while ensuring public trust in data use” and the government has set out five missions for this work:
  • Unlocking the value of data across the economy
  • Securing a pro-growth and trusted data regime
  • Transforming government’s use of data to drive efficiency and improve public services
  • Ensuring the security and resilience of the infrastructure on which data relies
  • Championing the international flow of data
The Department for Digital, Culture, Media and Sport (DCMS) has now kicked off a 12-week consultation period. Last year, the Open Knowledge Foundation submitted our thoughts to help shape the National Data Strategy and also signed a joint letter with other UK think tanks, civil and learned societies calling for urgent action from government to overhaul its use of data.  In our evidence, we called for a focus on teaching data skills to the British public so we are glad to see a focus on data skills in the strategy where the government notes that “everyone needs some level of data literacy in order to operate successfully in increasingly data–rich environments”. We said that the UK has a golden opportunity to lead by example and boost its economy, but must invest in skills to make this a reality. Without training and knowledge, large numbers of UK workers will be ill-equipped to take on many jobs of the future. So while there are funding commitments and assigned actions to recruit expert innovation fellows and 500 data science analysts into government, we hope to see future funding set aside to improve data literacy and data skills for all, not just public sector experts. As we noted in 2019, “learning data skills can prove hugely beneficial to individuals seeking employment in a wide range of fields including the public sector, government, media and voluntary sector” so getting this right will be crucial if the government hopes to make the better use of data part of its plan for building a stronger economy in the wake of the COVID-19 pandemic. We also welcome the strategy’s focus on “fixing the plumbing” and ensuring that data is fit for purpose and standardised. As noted in our 2019 submission, there is often a “huge amount of work required to clean up data in order to make it usable before insights or stories can be gleaned from it” so further efforts to improve data quality and standardisation are sorely needed. On the availability of data and open data, it is encouraging to see a recognition of issues relating to the government’s consistency in open data publication with a promise to review and better measure the impact of existing processes and published data. Our mission is a fair, free and open future so we also welcome the acknowledgement in the strategy of the overarching importance of harnessing data for the purpose of creating a fairer society for all. The consultation on the National Data Strategy is now open and runs for the next 12 weeks to 2nd December 2020.  We will be examining the ‘framework’ strategy documents further and look forward to engaging more with the process of refining and improving the strategy. The UK must not miss this opportunity to be at the forefront of a global future that is fair, free and open.