Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Week 1: Advancing development objectives through data

Economics and Politics of Data


With: Vivien Foster

Hello. My name is Vivien Foster. I'm the chief economist for the infrastructure vice presidency at the
World Bank and I have been the co-director for the World Development Report 2021, entitled Data for
Better Lives. Today I'm delighted to be with you as part of module one, to discuss the topic of economics
and politics of data which will give you some of the conceptual foundations for the World Development
Report. In the media these days, data is often compared to oil. And well that is true that data powers
the new economy in the way that the oil powers the traditional economy. The analogy breaks down
right after that. In other respects, data could not be more different from oil. The reason is that oil is an
exhaustible resource as soon as someone consumes it there's less available for everyone else. Data on
the other hand, is an inexhaustible resource. It can be used and reused again and again without
diminishing the amount available for others to use.

To describe this characteristic of data, we use the economic term non-rivalrous, meaning that one
person's use of data does not reduce the amount of data available for others to use. In this sense, data
resembles a public good. This is why data can be repeatedly shared and repurposed across different user
groups creating more and more economic value every time. We capture this through the idea of the
data cycle, and you can see here in the slide, so data is created then processed then stored and then it
may be shared, analyzed and preserved. And this cycle can go on indefinitely unless deliberate steps are
taken over here to destroy data for some particular reason. This non-rivalrous characteristic of data has
two important consequences. First, it makes it very difficult to put an economic value on data and
second, it means that there's likely to be systematic under-investment in data.

So what is the economic value of data? There have been several studies that attempt to put such an
economic value on data. They do so based on three possible methods. The first method is cost based
and you can see it here in the first row of this table. Cost based means that we add up the expenditure
that different entities incur in data collection using for example the national accounts. Adopting this
approach, Statistics Canada for instance was able to estimate the total value of government data in
Canada to Canadian society and put it at around 14 billion dollars in 2019. A Second approach to valuing
data that you can see here in the middle part of the table is the benefits-based approach. This means
quantifying the ways in which data benefits uses through improved efficiency, reduced transaction costs
or perhaps expanded markets. For example, the value of public transport data to travelers in London
was estimated at US dollars a 120 million per year. This includes for instance the time savings that
travelers achieved by having real time information on the arrival of buses and trains and thereby
avoiding long and unnecessary waits.

A third and final approach for valuing data is the market-based approach you can see it in the final row
of the table and this involves taking information on stock market or other commercial transactions

Page 1 of 4
involving data. For example, Apple makes almost 43 billion dollars a year selling data to advertisers to
improve targeting of marketing material. What these different studies illustrate is that the economic
value of data even in quite narrowly defined uses can be very substantial. However, while it may be
possible to estimate the value of data in its original primary use intended by the data collector, it's
impossible to anticipate what other secondary uses may arise once data has been shared and
repurposed. In fact, it is quite conceivable that the secondary uses of data may generate even more
economic value than the original primary use of it. In that sense, data can never be fully valued, and any
valuation produced is only ever a lower bound.

These difficulties with the valuation of data mean that in practice, data will be undersupplied. Why is
that? Well, the reason is that data collectors only take into account the value that they themselves
derive from data which is the orange part of the cycle from creation processing and storage. But they fail
to take into account, the value that others will derive as this data is shared and gradually circulates
around the economy. So, when appraising investments in data collection, they'll only consider their
private returns and not take into account the full social benefits. As a result, they will invest too little in
data collection. Surely an obvious solution would be to allow primary data collectors to sell data to
secondary users. This would allow them to capture the secondary benefit encouraging them to invest
more in data collection in the first place. So, this brings us to another important question that's often
asked about data. Why can't they just simply be traded in markets like other goods with economic
value? While there are clearly significant data markets already in existence, for example, markets for
advertising data or for technical data sets at various kinds, there are certain economic characteristics of
data that make it difficult for all data to be treated in this way.

First when it comes to personal data, property rights are not easily defined. The reason is that there are
multiple parties with an overlapping interest in such data. There is the data subject themselves and the
data collector and the data user and it can be difficult to define which of these has the most natural
ownership claim. For example, call detail records regarding my geographic location could be argued to
belong to me and/or to the company that created the device I carry and provides the cellular service. A
second reason why it's difficult to trade all data on markets, is that data transactions are complicated by
the fact that it's difficult for potential user to verify the value of the data before they actually gain access
to it and make use of it. As a result of these considerations, there are many types of data for which
markets do not exist and might be difficult to create.

But even if all data could be marketed, there's another reason why people may not be willing to share or
sell their data with anyone else. This is due to another important economic characteristics of data.
Namely the fact that data is excludable, meaning the person who originates the data can prevent others
from gaining access to it and in this sense, data resembles a private good. Moreover, data collectors may
find that excluding others from accessing data helps them to build up their own power. Two types of
power are relevant here. First of all, market power if we’re thinking of the private sector. Concentration
of data in the hands of private companies can give them significant competitive advantages. This is
particularly true as we'll see later in the World Development Report, in the case of digital platform
businesses. Such businesses use data to create network effects that need to exponential growth and
resulting concentration of market power. The greater the number of people who’s data is on a social

Page 2 of 4
platform, the higher the value of that social platform for others to join, creating a reinforcing cycle and
making social platform businesses unwilling to share their data with rivals. Another kind of power
associated with the hoarding of data is political power. Concentration of data in the hands of
government entities can bolster such power. Some governments refused to publish data or may
manipulate the data they publish so as to dilute their accountability to the public. Other governments
may misuse personal data for politically motivated surveillance of oppressed social groups that helps to
consolidate their grip on society.

The motivation to hoard and accumulate data stems from the significant economies of scale that data
exhibit. This means that the more data somebody has, the more valuable that data becomes. We can
think of this as an S curve as shown in the chart. If you only have a small amount of data, it doesn't have
much value at all. Your sample may not even be large enough to infer anything reliable about the
population from which it was drawn. However, as you accumulate more data, you reach a critical mass
and have a dataset that is large enough to provide meaningful information and reliable inferences and
hence its value escalates rapidly. Beyond a certain statistical threshold however, additional data points
may not provide much further information to a dataset that is already large and thereby not really add a
great deal to its value as we can see here in the shallow part of the S.

Modern artificial intelligence and machine learning techniques are particularly data hungry, sometimes
requiring millions of data points to function effectively. In fact, the more sophisticated the algorithm,
the larger the dataset required. So, one might say that this steeper part of the S curve of which the value
of data escalates rapidly is shifting to the right, meaning that people need more and more data points to
get onto that high value portion of data ownership, if their intention is to use the data for the
application of algorithms.

A related motivation for hoarding data is the existence of economies of scope. These arise when the
value of holding a particular type of data is further enhanced if another complementary type of data is
held at the same time. For example, the value of a potential customer email database to a tourism
agency is further enhanced by a dataset capturing which of these customers exhibiting travel planning
behavior. For example, by visiting websites related to travel and this would be even further enhanced by
database of customer zipcodes for which it may be possible to infer something about income levels.
Marketing material could then be more efficiently targeted at the wealthiest individuals exhibiting most
curiosity about travel information through their web searches.

We can represent the existence of economies of scope as an upward shift in the whole value curve,
meaning that at any particular scale of dataset, the value of that data is enhanced by the possession of a
complimentary dataset.

In conclusions, any of you that have a background on economics, will recall that public goods are
defined by two key characteristics. They are non-rivalrous, meaning that one person's consumption of
the good does not reduce the amount available for another person to consume and they are non-
excludable, meaning that it is not possible to prevent anyone from consuming the good. As we have
seen, data is part public good because it's non rivalrous and part private good because it's excludable.
This has important consequences. The non-rivalrousness of data means that data should ideally be

Page 3 of 4
shared and reused as much as possible to create more and more economic value. However, the
excludability of data means that data holders have the means and often the incentive to prevent other
people from sharing their data, should they so wish. In practice, this means that data will be both under-
produced and under-shared. It will be under-produced because data collectors are unable to capture the
value that secondary users create from their data and it will be under-shared due to the fact that
economies of scale mean that the increasing accumulation of data boosts, both economic and political
power. Thereby diluting incentives to make data available to others. And these are two of the
fundamental obstacles to the data economy. Thank you.

Page 4 of 4

You might also like