The cost of closed data & the economics of open data

October 17, 2011

This guest post by Chris Taggart was originally published on his blog at Chris is co-founder of OpenCorporates,  founder of OpenlyLocal, and member of the OKF open government working group  Yesterday I received an email from a Cabinet Office civil servant in preparation for a workshop  tomorrow about the Open Data in Growth Review, and in it I was asked to provide:
an estimation of the impact of Open Data generally, or a specific data set, on UK economic growth…  an estimation of the economic impact of open data on your business (perhaps in terms of increase in turnover or number of new jobs created) of Open Data or a specific data set, and where possible the UK economy as a whole
My response:
How many Treasury economists can I borrow to help me answer these questions? Seriously.
Because that’s the point. Like the faux Public Data Corporation consultation that refuses to allow the issue of governance to be addressed, this feels very much like a stitch-up. Who, apart from economists, or those large companies and organisations who employ economists, has the skill, tools, or ability to answer questions like that. And if I say, as an SME, that we may be employing 10 people in a year’s time, what will that count against Equifax, for example (who are also attending), who may say that their legacy business model (and staff) depends on restricting access to company data. If this view is allowed to prevail, we can kiss goodbye to the ‘more open, more fair and more prosperous‘ society the government says it wants. So the question itself is clearly loaded, perhaps unintentionally (or perhaps not). Still, the question was asked, so here goes:
I’m going to address this in a somewhat reverse way (a sort of proof-by-contradiction). That is, rather than work out the difference between an open data world and a closed data one by estimating the increase from the current closed data world, I’m going to work out the costs to the UK incurred by having closed data. Note that extensive use is made of Fermi estimates and backs of envelopes
  • Increased costs to the UK of delays and frustrations. Twice this week I have waited around for more than 10 minutes for buses, time when I could have stayed in the coffee shop I was working in and carried on working on my laptop had I known when the next bus was coming. Assuming I’m fairly unremarkable here and the situation happens to say 10 per cent of the UK’s working population through one form of transport or another, that means that there’s a loss of potential productivity of approx 0.04% (2390 minutes/2400 mins x 10%). Similar factors apply to a whole number of other areas, closely tied to public sector data, from roadworks (not open data) to health information to education information (years after a test dump was published we still don’t have access to Edubase) – just examine a typical week and think of the number of times you were frustrated by something which linked to public information (strength of mobile signal?). So, assuming that the transport is a fairly significant 10% of the whole, and applying it to the UK $2.25 trillion GDP we get £9000 million. Not includedloss of activity due to stress, anger, knock-on effects (when I am late for a meeting I make attendees who are on time unproductive too), etc
  • Knock-on cost of data to public sector and associated administration. Taking the Ordnance Survey as an example of a Shareholder Executive body, of its £114m in revenue (and roughly equivalent costs), £74m comes from the public sector and utilities. Although there would seem to be a zero cost in paying money from one organisation to another, this ignores the public sector staff and administration costs involved in buying, managing and keeping separate this info, which could easily be 30% of these costs, say 22 million. In addition, it has had to run a sales and marketing operation costing probably 14% of its turnover (based on staff numbers), and presumably it costs money collecting, formatting data which is only wanted by the private sector, say 10% of its costs. This leads to extra costs of £22m + £16m + £14m = £52 million or 45%. Extrapolating that over the Shareholder Executive turnover of £20 billion, and discounting by 50% (on the basis that it may not be representative) leads to additional costs of £4500 million. Not included: additional costs of margin paid on public sector data bought back from the private (i.e. part of the costs when public sector buys public-sector-based data from the private sector is the margin/costs associated with buying the public sector data).
  • Significant decreases in exchange of information, and duplication of work within the public sector (not directly connected with purchase of public sector data). Let’s say that duplication, lack of communication, lack of data exchange increases the amount of work for the civil service by 0.5%. I have no idea of the total cost of the local & central govt civil service, but there’s apparently 450,000 of them, earning, costing say £60,000 each to employ, on the basis that a typical staff member costs twice their salary. That gives us an increased cost of £1350 million. Not included: cost of legal advice, solving licence chain problems, inability to perform its basic functions properly, etc.
  • Increased fraud, corruption, poor regulation. This is a very difficult one to guess, as by definition much goes undetected. However, I’d say that many of the financial scandals of the past 10 years, from mis-selling to the FSA’s poor supervision of the finance industry had a fertile breeding ground in the closed data world in which we live (and just check out the FSA’s terms & conditions if you don’t believe me). Not to mention phoenix companies, one hand of government closing down companies that another is paying money to, and so on. You could probably justify any figure here, from £500 million to £50 billion. Why don’t we say a round billion. Not included: damage to society, trust, the civic realm
  • Increased friction in the private sector world. Every time we need a list of addresses from a postcode, information about other companies, or any other public sector data that is routinely sold, we not only pay for it in the original cost, but for the markups on that original cost from all the actors in the chain. More than that, if the dataset is of a significant capital cost, it reduces the possible players in the market, and increases costs. This may or may not appear to increase GDP, but it does so in the same way that pollution does, and ultimately makes doing business in the UK more problematic and expensive. Difficult to put a cost on this, so I won’t.
  • I’m also going to throw in a few billion to account for all the companies, applications and work that never get started because people are put off by the lack of information, high barriers to entry, or plain inaccessibility of the data (I’m here taking the lead from the planning reforms, which are partly justified on the basis that many planning applications are not made because of the hassle in doing them or because they would be refused, or otherwise blocked by the current system.)
What I haven’t included is reduced utilisation of resources (e.g empty buses, public sector buildings – the location of which can’t be released due to Ordnance Survey restrictions, etc), the poor incentives to invest in data skills in the public sector and in schools, the difficulty of SMEs understanding and breaking into new markets, and the inability of the Big Society to argue against entrenched interests on anything like and equal footing. And this last point is crucial if localism is going to mean more rather than less power for the people. So where does that leave us. A total of something like:

£17,850 million.

That, back of the envelope-wise, is what closed data is costing us, the loss through creating artificial scarcity by restricting public sector data to only those pay. Like narrowing an infinitely wide crossing to a small gate just so you can charge – hey, that’s an idea, why not put a toll booth on every bridge in London, that would raise some money – you can do it, but would that really be a good idea?
And for those who say the figures are bunk, that I’ve picked them out of the air, not understood the economics, or simply made mistakes in the maths – well, you’re probably right. If you want me to do better give me those Treasury economists, and the resources to use them, or accept that you’re only getting the voice of those that do, and not innovative SMEs, still less the Big Society. Footnote: On a similar topic, but taking a slightly different tack is the ever excellent David Eaves on the economics of Toronto’s transport data. Well worth reading.  

Introducing a new list just for open data on companies

August 1, 2011

The following is a guest post from Chris Taggart, co-founder of and member of the Open Knowledge Foundation’s Working Group on Open Government Data One of the key types of data that affects all our lives in a multitude of ways is that on companies and corporate entities. As companies have changed from being single entities to multifaceted, multi-jurisdiction webs understood by few people even within the company, open data is the only way we can map these entities and their relationships with the rest of the world. And yet, despite it being mostly collected by governments for a statutory purpose, until recently there’s been very little open company data about. That’s rapidly changing thanks in part to OpenSpending and OpenCorporates, and other open data projects such as OpenCorporates [disclosure: I'm the co-founder and CEO] is quickly creating one of those key building blocks necessary to understand the place of companies in our lives – a URL/URI for every company in the world, with now over 20 million companies and nearly 30 jurisdictions. OKF’s OpenSpending too has made great progress in mapping payments by governments to companies, building a platform that extends the original Where Does My Money Go, and now can include government spending of pretty much any type. But still there’s a long way to go, and a lot of battles to be won, and a lot of knowledge that’s incomplete or missing, from finding the company registry for a given country to understanding company structures. Hence, the launch of a mailing list just about open company data: open-companies. So if you’ve got questions about open data of companies, want to access it, or have expertise in understanding the structure, rules and accounts of companies, we’d love you to join this mailing list. So get on over to, sign up, and let’s get chatting about how to solve the main problems around open company data. Chris

OpenCorporates hits 20 million companies, an open data milestone

August 1, 2011

The following is a guest post from Chris Taggart, co-founder of and member of the Open Knowledge Foundation’s Working Group on Open Government Data Less than eight months ago, OpenCorporates : The Open Database Of The Corporate World launched with the rather ambitious goal of creating a URL for every company in the world. Five months later, it had already reached 10 million companies. And now, barely 3 months after that, it has doubled that to over 20 million companies. In that time, OpenCorporates has: OpenSpending is already using OpenCorporates‘s Google Refine reconciliation service, and using the resultant open URIs OpenCorporates to identify the recipients of UK government spending, and we expect this collaboration to get even closer as OpenCorporates and OpenSpending add more data, and more countries. Not made for a self-funded micro-startup (and, of course, the open data community, without which it wouldn’t have been possible). So join us in celebrating this milestone, and help us make sure the biggest and best database of company information in the world is an openly licensed one by tweeting about OpenCorporates, +1′ing us, liking us on Facebook, and above all linking to us.

OpenCorporates: the Open Database of the Corporate World

December 20, 2010

This is a guest post by Chris Taggart, a member of OKFN’s open government working group and creator of OpenlyLocal, who today launched a new website OpenCorporates in collaboration with Rob McKinnon (a project they first demoed at the Open Government Data Camp in November). Why OpenCorporates? Like most open data/open source projects, it was started (just a couple of months ago), because the founders, Chris Taggart & Rob McKinnon, needed such a resource to exist. Specifically we needed:
  1. an open data base of companies not just in the UK, or in another individual country, but in any country
  2. a way of matching lists of company names to real-world companies (with their company numbers)
  3. a place where the increasingly large amount of open government data relating to companies could be brought together, with all the power that would bring to the community
So, OpenCorporates was created, and while it’s very, very early days, we think we’ve got something that is massively more usable than anything else out there (and did we mention it’s open data too?). So, without any more delay, let’s take a quick run through the main features. The first place is, reasonably, the home page, where you can search for a company name from the over 3,800,000 companies in the OpenCorporates database You can also start browsing the database by filtering by jurisdiction (this similar but not the same as country – more on this in a later post), and from there to filtering by company type or status. The next bit is where it starts to get really interesting, and that’s where we can start to filter based on public data we’ve imported. Let’s say we want to see all the company with Financial Transactions – there’s possibly a better way of expressing this, but these are all the UK central government spending items recently release as part of its drive to open up government. Click on the Financial Transactions filter and you get: There’s 4955 companies who received a payment from central government. Let’s now see those who received notices from the UK Health & Safety Executive by clicking on the filter to the right: Then let’s choose an industry classification, say, Fishing, Fish Farming etc. OK that’s just one company. DUCHY OF CORNWALL OYSTER FARM LIMITED, and clicking on that gives us the following screen: OK. Interesting, but click through onto the transaction, and you get this: I’ll leave it to the reader to dig out more about that transaction (clue:, but I think you’ll agree it’s a pretty useful starting point. The second core feature is the ability to matcth company names to real-world companies, complete with company numbers. To do this, we’ve implemented the back end stuff that the awesome Google Refine needs, and here a short screencast will do the job of a thousand words: screencast on vimeo. It’s worth mentioning one last feature, which is some ways is the most powerful but not at all sexy, and that’s the ability to have a URL for every company in the world (we’ll be adding the ability for the community to add companies soon). Why is this important? Because when we’re talking about companies, it’s difficult to be sure which company we’re talking about. We need universal identifiers for them, and the best are URLs. This means that different people can refer to the same OpenCorporates url (here’s the one for Google Bermuda Limited) and be sure that they’re talking about the same company. Finally, we’ve got lots of features we’re working on, including a full-blown API, so it’s easy to get the data out and reuse it elsewhere. Watch this space, follow @OpenCorporates on twitter and start exploring.