You are browsing the archive for Chris Taggart.

Open tax data, or just VAT ‘open wash’

- July 30, 2013 in Featured, Open Data, Open Economics, Open Government Data, Public Money, WG Open Government Data

This post is by Chris Taggart, the co-founder and CEO of OpenCorporates, the largest open database of companies in the world, and a member of the Open Government working group. [Disclosure: I am on the UK Tax Transparency Board, which has not yet discussed these proposals, but will be doing so at the next meeting in early September] A little over a week ago, Her Majesty’s Revenue & Customs (HMRC) published a consultation on publishing its data more widely, and in it stated its intention to join the open-data movement.
The UK helped secure the G8’s Open Data Charter, which presumes that the data held by Governments will be publicly available unless there is good reason to withhold it. It is important that HMRC plays a full part. HMRC’s relationship with businesses and individuals is unique, and this is reflected in the scope and depth of the information HMRC collects, creates and protects on behalf of taxpayers.

Great. Well, no. The problem is that, despite what the above says, this consultation and the proposals within have little to do with open data or widening access, but instead are primarily about passing data, much of it personal data relating to ordinary individuals, to the anointed few. It also exposes some worrying data-related problems within HMRC that should be ringing alarm bells within government. So what exactly is being suggested? There are two parts:
  1. Proposals to do with sharing HMRC’s data, particularly aggregated and anonymised data. At the moment HMRC can, in general, only share such data if it relates to HMRC’s functions, even if it’s in the wider public benefit.
  2. Proposals to do with the VAT Register. The VAT Register is currently private, even though the a large extent much of the information is ‘out there’, on till receipts, on invoices, on websites, and in various private datasets, and in fact in many countries it’s already public.
Both have their issues, but for moment we’ll concentrate on the second. Now there has been no great clamour for the VAT Register from open-data activists (unlike say the postcode address file, company register, or Ordnance Survey data), so why is it being opened up? Well, why not? As the consultation says:
An underlying principle in developing the proposals in this chapter is brought out in the Shakespeare Review. Data belong to citizens and the presumption of government should be towards openness, unless this causes harm. It is not for government to dictate the nature of the opportunity. The corollary is that the Government will not always be aware of the range or scale of potential benefits, as the quotation below shows – this consultation will help to establish these.
So the proposal is to publish the VAT Register as open data, so that the wider community can do cool stuff with it? No. The consultation neatly elides from this lofty aim with something rather more grubby.
There has been public interest for some time, for example from credit reference agencies (CRAs), in the publication of VAT registration data as a resource to generate benefits.
Don’t the three big credit reference agencies (Experian, Equifax and Callcredit) already know a lot about companies? Surely they know the VAT numbers of many of them, and in any case know a lot more about most companies, especially active, trading companies (the sort that are registered for VAT)? What they don’t have, however, is much information about sole-traders, small partnerships, individuals trading on their own account and without the shield of limited liability, with the responsibilities for publishing information that comes with that. That’s why the VAT register is so important to them, and that’s what this consultation is proposing to give them. Of course they could just ask people for that information. But people might refuse, particularly if they don’t need to borrow money, and that would be a problem as far as building a monetisable dataset of them. If they could only get the government to give them access to that data – have the government act as their own data-collection arm, with the force of law to compel providing of the information – that would be great. For them. For individuals, and for the wider world, it’s not good at all. First, because what we’re talking about here are individuals, who have privacy and data protection rights, not companies, and there needs to be compelling reasons for making that public in the first place – just because the big three credit reference agencies, or CRAs (Experian, Equifax, CallCredit), think they can make money from it isn’t good enough. Second, because if open data is about one thing, it is about democratising access to data, about reversing the traditional position where, to use the words of the Chancellor, George Osborne, “Access to the world’s information – and the ability to communicate it – was controlled by an elite few”. And if there’s one thing that’s certain it’s that the CRAs have a lot of power. But wait, doesn’t the consultation also propose that some of the VAT register is published as open data, specifically “a very selective extract covering just three data fields – VAT registration number (VRN), trading name, and Standard Industry Code (SIC) classification number”. At first sight this might be seen as good, or better than nothing. In fact it shows that HMRC either doesn’t get data, or it’s just ‘openwash’ – an open-data figleaf to obscure the passing of personal and private data wholesale to the CRAs, and one that could potentially lead to greater fraud. Here’s why:
  • The three fields (VAT number, trading name, SIC code) together make up an orphan dataset, i.e. one that’s unconnected with any other data, and therefore is fundamentally useless… unless you want to fraudulently write an invoice calling yourself ‘AAA Plumbing’, charging VAT on it, and pocketing the 20%, knowing that either you will never be caught, or the real AAA Plumbing will be first place HMRC will come looking.
    Fraud is fundamentally about asymmetries of information flows (the fraudster knows more about you than you know about them). If, for example, you know that the real AAA Plumbing is a company with a registered address in Kirkcaldy, Scotland, for example, or the BBB Services is dissolved or has a website showing it works in the aircraft business, then you have a much greater chance of avoiding fraud.
  • Trading names are very problematic, and in general are not registered anywhere, so are little help. They also need have no relationship to the legal name, either of the person or the company. So if you want to find the company behind ZZZ Financial Experts, if indeed there is one, you’re out of luck. It’s puzzling that HMRC would even consider publishing the VAT Register without the legal form, and in the case of companies the company number.
  • One of the stated reasons for publishing the register is that “VAT registration data could also provide a foundation for private sector business registers”. Really? In this world of open data and the importance of core reference data, HMRC wants a private, proprietary identifier set to be created, with all the problems that it would entail? In fact, HMRC was supposed to working with the Department of Business, Innovation & Skills to build such a public dataset. Has it decided that it doesn’t understand data well enough to do this? Or would it rather shackle not just the government but the business sector as a whole to some such dataset?
  • Finally, it’s also rather surprising to discover that the VAT register appears to contain fields such as the company’s incorporation date and SIC codes. In the geek world we call this a denormalised dataset, meaning it’s duplicating data that rightfully belongs in another table or dataset. There are sometimes good reasons for doing this, but there are risks, such as the data becoming out of sync (which is the correct SIC code – the one on the VAT Register or on the Companies House record).
So what should HMRC be doing? First, it should abandon any plans to act as the Credit Reference Agencies’ data collectors, and publish the VAT register or part of the VAT register as a single open dataset, equal to all under the same terms. This would be a genuine spur for innovation, and may even result in increased competition and transparency. Second, it should realise that there’s a fundamental difference between an individual – a living, breathing person with human rights – and a company. As well as human rights, individuals have data protection rights, privacy rights and don’t exist on a public register; companies on the other hand are artificial entities given a distinct legal personality by the state for the good of society, and in return exist in public (on the public Register of Companies). In the case of the VAT register, the pragmatic approach would be to publish the register as open data, but only that part that relates to companies. Third, it needs to realise that it is fundamentally in the data business, like it or not, and it needs to quickly get to grips with the modern data world, including the power of data, for good, and for bad. The UK has probably the leading organisations in the world in this area, including OpenCorporates, the Open Knowledge Foundation and the Open Data Institute.

How open is corporate data in Open Government Partnership countries?

- April 16, 2012 in External, Featured, News, Open Government Data, WG Open Government Data

Today, the day before the Open Government Partnership meeting starts in Brasilia, OpenCorporates is publishing a major new report into access to company data in OGP countries, and the picture is not good. Out of a total of a possible 100 points, the average score was just 21, with several major countries (including Spain, Greece and Brazil) scoring zero. A score of 100 means that the company register is an open data register, making detailed information free for reuse under an open licence, and also makes the information available as open data. A score of zero means the central register can not even be search without payment or registration. Highest score is the Czech Republic, with a score of 50, though the UK will achieve a score of 70 when it starts publishing a limited set of data under an open licence in July.

Virtually all OGP countries score very badly for openness of company data, with several – including countries such as Spain, Greece and Brazil – effectively closed for the public, civil society and the wider world, undermining corporate governance, and providing a fertile ground for corruption, money laundering, organised crime, and tax evasion

The full report is available here, and the data is available (under an open licence at, but a summary of the data should be available below: This item was originally posted on the Open Corporates New blog.

Bounties for scrapers: a new approach to opening global data

- March 30, 2011 in Guest post, Open Data, Open Government Data, opencorporates, Process, scraping

This is a guest post by Chris Taggart, co-founder of and member of the Open Knowledge Foundation’s Working Group on Open Government Data. On Friday we at OpenCorporates announced an innovative (and frankly untested!) way for the open data community to work together in helping opening up one of the most important datasets there is: company numbers and names. Full details are on the OpenCorporates blog, but basically we’re working with the superb ScraperWiki to open up the company names and numbers in a consistent way. And because we’d like to use that info, we’re offering small bounties for each jurisdiction that’s added, with a total pot of £2,500 (it’s worth stressing that neither the scrapers nor the data will belong to OpenCorporates). We’ve already had a few scrapers written in response to the challenge, but are plenty more territories to do, and we’re particularly keen to see the opening up of the data for those countries where the system is a little newer, such as those in Eastern Europe, north Africa, or Asia. And then, of course there’s the huge task of the US states (we’ve done Michigan and DC). It’s worth saying that many of these registers have distinctly un-open licences. This is in part why we’re just asking for the most basic and non-contentious information: the company name, number and possibly status or company type. However moving forward we need to open up the whole register, and we’ve already had positive discussions with some countries for doing this. Till then, happy scraping. Related posts:
  1. OpenCorporates: the Open Database of the Corporate World
  2. Opening up government finances
  3. The Medical Innovation Convention: A New Global Framework for Healthcare Research and Development