Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Personal data on sale

Fernando Beltrán
University of Auckland
August 2022

Abstract

Having reached an unprecedented degree of connectivity, we have become the target of countless
organisations that aim to gain access to information about ourselves on many fronts, types, and
circumstances of our lives. Like it or not, our personal communications devices are the source of data
transmitted and shared with actors that find value in their using, processing, and even selling our data.
Rendering personal data in exchange for free digital services, selling personal data at data
marketplaces, and contributing personal data to national statistics organisations (i.e., a country’s
census) are situations where our personal data are stored, processed and consumed by such actors.

Digital platforms derive a significant fraction of their revenue from advertisers who are allowed to
access, retrieve, and manipulate the data that users, knowingly or not, generate continuously. While
on the user edge of the platform, such personal information is collected without compensation to the
user, on the advertiser edge, a market for the sale of personal information soars. Some analysts [1]
consider that a market failure arises on the user edge where platforms provide free internet search or
social networking. Further, on data marketplaces where data are sold and bought, individuals can sell
their personal information while given varying degrees of privacy protection and assurance of
anonymity. Finally, national census organisations rely on truthful disclosure of information by
individuals and families, who entrust their governments with their data for the greater benefit society
will enjoy if and when data-driven policies and decisions are made.

An underlying theme, and one whose importance has become increasingly visible, is people's concern
about how their data privacy is protected.

This paper proposes an analytical framework to understand the drivers of data value in the three
institutions listed above while examining mechanisms in place or proposed for protecting the identity
of data owners. It reviews the literature on data pricing and privacy protection to understand how
individuals value their personal data, as value becomes a support for data pricing decisions.

We identify three pathways for personal data once the individual agrees to their release. Personal
data may turn into a public good collected by a national census office; it can be sold at a data
marketplace, or it can be “exchanged” for digital services at a digital platform. In all cases, we analyse
the trade-off between data value (or price where it applies) and privacy loss, identifying metrics that
quantify it. The literature provides instances of this issue: [2] views the statistical reports from the
census data as the product of deciding the optimal levels of two inputs: statistical accuracy and privacy
loss. Next, a model is discussed to maximise the social value of accurately reporting while protecting
individuals' privacy. And [3] discusses legal approaches to monetise an individual's behavioural
surplus, that is, an individual's data stored on a social network account. We contribute the preliminary
results of lab experimentation with subjects who participate in reverse auctions to purchase their
personal data with and without assurance of privacy protection.

Electronic copy available at: https://ssrn.com/abstract=4190073


1. Introduction

Many organisations seek access to information about ourselves on many fronts, types, and
circumstances of our lives. Our personal communications accounts with Internet-mediated social
networks and online transactions with many applications are vehicles for such organisations to source
data about us that are then processed, stored and later sold or traded. Most of the time, and in most
cases, we do not derive any direct monetary benefit, although the data collectors may have already
raised revenue from our data.

Digital platforms derive a significant fraction of their revenue from advertisers who are allowed to
access, retrieve, and manipulate the data that users, knowingly or not, generate continuously. While
on the user edge of the platform, such personal information is collected without compensation to the
user, on the advertiser edge, a market for the sale of personal information soars.

We identify three pathways for personal data once the individual agrees to their release. They are, a.
personal data may either turn into a public good collected by a national census office; or, b. they can
be sold at a data marketplace, or c. they can be “exchanged” for digital services at a digital platform.

One aspect of the current importance of personal data still in its infancy is the value of such data to
the individual who "owns them". On the other hand, people may have concerns about their privacy
and how their personal data can or should be protected. Data value and privacy protection may have
a direct relation in that the more valuable data is to an individual, the higher expectations of privacy
protection. In contrast, when individuals release their data, there is an assumption that privacy loss
increases, given that they have lost control of the data released.

In the three cases, we analyse the trade-off between data value (or price where it applies) and privacy
loss, identifying metrics that quantify it. For instance, [2] views the statistical reports from the census
data as the product of the statistics agency deciding on the optimal levels of (statistical) accuracy and
privacy loss. Also, [3] discusses legal approaches to monetise an individual's behavioural surplus, that
is, an individual’s data stored on a social network account. Finally, in [4], preliminary results of lab
experimentation with subjects participating in reverse auctions to purchase their personal data with
and without assurance of privacy protection are presented.

This paper proposes an analytical framework to understand several aspects of personal data in the
three institutions listed above. We examine four issues that provide a complete description of
personal data characteristics necessary for the present analysis. The first issue is related to ownership;
however, as trading enables the transfer of ownership, the question of whose data these are is settled.
For this reason, we should better discuss data control: who controls the data? The exact meaning of
this will be made clear in later sections. The second is the value of data. Undeniably, data brokers and
marketplaces have an appetite for data, particularly individuals' personal data, which is a clear sign of
data value to these players. A third issue arising from data availability is who has access to data. Once
data hit the market, paying for them means having access, and it is pertinent to ask: does the data
keeper impose any restrictions regarding who can access the data? Last is the expectation that
individuals have been increasingly more aware that their identities need to be protected and should
not be easily found by anyone having access to the data. In other words, individuals have an increasing
need for privacy protection.

The paper unfolds as follows: section 2 presents definitions of technical concepts, which are essential
to the discussion in later sections. Section 3 discusses the flows of information and value in the
relationship between individuals and their membership in social networks. In section 4, we present
the data marketplace as the new player in the data market, emphasising the role of a particular type

Electronic copy available at: https://ssrn.com/abstract=4190073


of marketplace that allows individuals to trade in their personal data. Section 5 describes the role of a
national statistical agency as a producer of statistical reports, characterising its decision-making
problem to use the best combination of privacy loss and statistical accuracy. Section 6 summarises
and exemplifies the four data characteristics we use in our analysis.

2. Preliminaries

This section presents the definitions of technical concepts used in the analysis of data privacy
protection and statistical report accuracy in later sections.

A database is represented by a matrix A. Information about N individuals is contained in each of the N


columns of A. Each column records a variable. The rows of A come from a discrete, finite-valued
domain X.

A can be represented as a histogram as follows. If xk is the number of times a row k ∈ X shows in A,


the vector x = (xk) is a |𝑋𝑋| × 1 vector, known as the histogram representation of A.

Other frequent queries over A include counting queries and descriptive statistics.

A statistical report on A's data can be obtained using a query workload, 𝑄𝑄(⋅) = {𝑞𝑞1 (⋅), ⋯ , 𝑞𝑞𝑘𝑘 (⋅)}. Using
the histogram representation of A, a query workload on histogram x can be written as 𝑄𝑄(𝑥𝑥) =
{𝑞𝑞1 (𝑥𝑥), ⋯ , 𝑞𝑞𝑘𝑘 (𝑥𝑥)}.

Many statistics from a data set A can be expressed as linear queries. A linear query is q(x) = qTx, where
𝑞𝑞 ∈ [−1, 1]|𝑋𝑋| . 𝑄𝑄(𝑥𝑥) can be represented as a 𝑘𝑘 × |𝑋𝑋| matrix Q

when the workload queries are linear.

To simplify, we assume that any single data value in the data sets used below can only be 0 or 1. A
data set can be represented as an unordered set of 𝑛𝑛 entries, each representing the information
associated with a corresponding individual. A data set is denoted as 𝐷𝐷 = (𝑑𝑑1 , … , 𝑑𝑑𝑛𝑛 ) ∈ ℝ𝑛𝑛 , with 𝒟𝒟
being the collection of all possible data sets. If 𝑘𝑘 is a positive integer number, a query is a function 𝑓𝑓
that maps a data set to the set ℝ𝑘𝑘+ , i.e., 𝑓𝑓: 𝒟𝒟 → ℝ𝑘𝑘+ , which can be used to express a count or the mean
of a set of values or even a histogram.

Given two data sets, 𝐷𝐷 and 𝐷𝐷 ′ , their distance is measured by the ℓ1 distance in ℝ𝑛𝑛+ , ‖𝐷𝐷 − 𝐷𝐷 ′ ‖1 =
∑𝑛𝑛𝑖𝑖=1|𝑑𝑑𝑖𝑖 − 𝑑𝑑𝑖𝑖′ |. Two data sets are called neighbouring data sets if their distance is 1.

Formally, we consider a random mechanism 𝑀𝑀 that implements a query and returns a noisy query
answer, i.e., 𝑀𝑀: 𝒟𝒟 × Ω → 𝑂𝑂, where Ω is a probability space and 𝑂𝑂 is the space of all outputs. The
probabilities are over the randomness induced by the mechanism. Such a mechanism is also known
as a Data Publication Mechanism.

Differential privacy is then formally defined as follows [5]: a randomised mechanism 𝑀𝑀 is (𝜀𝜀, 𝛿𝛿)-
differentially private if for any pair of neighbouring data sets 𝐷𝐷 and 𝐷𝐷 ′, and for any 𝑆𝑆 ⊆ 𝑂𝑂:

Pr(𝑀𝑀(𝐷𝐷) ∈ 𝑆𝑆) ≤ 𝑒𝑒 𝜀𝜀 Pr(𝑀𝑀(𝐷𝐷 ′ ) ∈ 𝑆𝑆) + 𝛿𝛿


where 𝜀𝜀 𝑎𝑎𝑎𝑎𝑎𝑎 𝛿𝛿 ≥ 0 are small numbers that measure the stringency of privacy, and the probabilities
follow the randomness (probability distribution) defined in M. While 𝜀𝜀 speaks about the relative sizes
of the probabilities, 𝛿𝛿 talks about the relative distance between the probabilities. When 𝜀𝜀 𝑎𝑎𝑎𝑎𝑎𝑎 𝛿𝛿 are
close to 0, the two probabilities are close to each other, and we can tell the two query answers apart
only with a low probability. When 𝛿𝛿 is 0, the mechanism is 𝜀𝜀-differentially private, also known as pure
differential privacy.

Electronic copy available at: https://ssrn.com/abstract=4190073


Definition (Accuracy (I)): The data publication mechanism M(x, Q) has accuracy I, if

𝔼𝔼[ ‖𝑀𝑀(𝑥𝑥, 𝑄𝑄) − 𝑄𝑄(𝑥𝑥)‖22 ] = −𝐼𝐼

where x is a histogram and Q is a workload, 𝐼𝐼 ≤ 0, ‖∙‖22 is the Euclidean norm, and 𝔼𝔼[∙] follows the
randomness (probability distribution) defined in M(x, Q).

Major players in the high-tech sector need to gather, store and process personal information for
diverse reasons. As those companies get more involved in processing personal data, the value of
personal data and its privacy protection arise as two fundamental factors capable of reshaping their
business models. These organisations deal with large amounts of highly sensitive data, such as
browsing history, health data, personal interests and preferences and political opinions. Apple,
Google, Microsoft, and Uber have deployed differential privacy approaches to the data collected via
browsers when users interact with their services. Apple deploys differential privacy on iOS 10,
Microsoft uses it on Windows 10, Google protects their users' private information under differential
privacy, and Uber incorporates differential privacy into statistical queries that Uber staff conduct.
Their business missions, the evolution of data access technologies, and new data protection laws have
forced them to regard privacy protection techniques as key to their survival in the market.

3. Personal Data and Social Networks

Social media companies are businesses that simultaneously collect users' data and profit from them
[3]. Picture boards, social video, instant messaging, and other services attract users to the platform
who provide them for "free". In addition, the "premium" version may further draw new users willing
to pay for the service. It is not, however, the premium customers that make up the core business of a
social network but the revenue raised through advertisement.

Adapted from [3], Figure 1 reveals the value flows occurring in the interaction between social media
users, advertisers, and the platform. Social media users release their data in exchange for free services.
In turn, the social media platform sells advertisers access to users. The latter allows advertisers to
flood the users' personal social media space with personalised advertisements. [3] calls “behavioural
data” the reactions of users of a social media company to the advertisements targeting them. The
platform collects and analyses the data and negotiates with advertisers data transfer terms for the
advertiser to improve their recommendations. Data flow not only from the individual users to the
platform but also from advertisers, as users visit the advertiser's website and their behavioural data
are shared with the platform. In other words, behavioural data can also be collected outside the social
media platform.

Figure 2. flows occurring in the interaction between social media users, advertisers, and the platform.

Electronic copy available at: https://ssrn.com/abstract=4190073


A major recent criticism of social networks’ practices is presented in [1]. Social media companies use
the Internet to provide access to a myriad of services, each one specialising in a few. Users can sign
up to the companies' websites, becoming members of a community that can enjoy access to the
services for free. The latter illustrates what [1] calls the “primary market for digital services”. On the
other hand, these social networks can collect personal data from their members and keep track of
their interaction with the website, which includes interactions with other network members. [1] refers
to this as “the market for the sale of personal information”. Members are not paid for releasing such
private information to the company. The authors argue that due to the platform's dominance,
reflected in the virtuous-vicious circle described above, a market failure arises, as members agree to
their personal data being collected and used at zero price. In general, transactions in this market
happen that would not have occurred in a competitive market.

Examples of the relationship established by a social network and its members include Meta Platforms
Inc., which makes money by selling advertising space on Facebook, Instagram and Whatsapp. As Meta
focuses more on AR/VR technologies and services, Meta also sells advertising space and other types
of "properties" when users connect through Oculus VR products. The cost of advertising on Facebook
and Instagram is based on an auction approach. Marketers set a budget for advertising, and Meta
counters with how many impressions the company can get for their proposed budget. In 2021, Meta
reported total revenues of $118 billion. Meta's ARPU is about $41 for 2021, with about 1.93 billion
daily active users.

LinkedIn is a social media company focusing on professional networking and career development. It
facilitates recruiters and job seekers with job search by providing free and premium services. The
business model is known as the freemium model, which provides core services, such as connecting
with other professionals and charging recruiters and job seekers for premium services. The premium
services can be divided into business solutions and premium subscriptions. The former contributes
most of LinkedIn’s revenue. LinkedIn business solutions provide services such as finding new
networking opportunities and potential employees, marketing new campaigns, searching for a sales
lead or learning business concepts. At the same time, premium subscriptions allow users to unlock
certain features, such as better contact recommendations, unlimited profile search and unlimited in-
mail communication. Data collected by LinkedIn includes members' registration, profile and postings,
data from other partners, services members use such as visits, cookies, members' devices and
locations, messages, and sites members visit in response to ads. LinkedIn operates the data to provide,
support, personalise and develop its services. In addition, the data are used to offer personalised ads.

Instagram also makes money via advertising revenues. Instagram has specialised in "visual
advertisement". Visual storytelling is the link that builds relationships among Instagram's users. Some
users, known as influencers, build a following crowd through visual storytelling. Influencers led
themselves to be vehicles for advertisers to build their brands through visual advertising.

Snapchat’s users visit its platform to communicate with friends and others by sharing pictures, videos
and texts that vanish from the screen once seen [7]. Snapchat’s revenue is primarily from advertising.
Snapchat makes custom-made and interactive advertisements, collecting users' information when
they set up an account, behaviour data when they use its services, and news from friends and
advertisers. Snapchat uses the collected data to personalise its services.

Electronic copy available at: https://ssrn.com/abstract=4190073


4. Personal Data and Data Marketplaces

Privacy Rights, an American NGO, defines data brokers as “businesses that collect individual's personal
information and resell or share that information with third parties.” However, the public whose data
are collected by data brokers rarely have any contact with the companies and are usually unaware of
the brokers' commercial practices. Based on the 2014 Federal Trade Commission report and industry
trends, Privacy Rights classifies brokers into one of four primary service categories:

1. Financial, Fraud Detection, Risk Mitigation

These brokers compile consumer reports that are used to determine credit worthiness. They may
also provide identity verification, fraud detection and risk mitigation services.

2. Health

These brokers compile and sell information that relates to a person’s health.

3. Marketing

These brokers collect information through online tracking and sell products that allow businesses
to engage in targeted marketing. The types of data they collect are: the types of ads that a person
interacts with, the time a person spends on specific websites, how a user interacts with a website,
and individual consumer profiles.

4. People Search

These brokers compile personal information—often from public records and social media—to
create reports or profiles.

In 2018, the US state of Vermont created a data broker registration law. In 2019, another state,
California, passed a registration law that required data brokers to register with the Office of the
Attorney General. Later, in 2020, the California Consumer Privacy Act required data brokers to include
a clear and conspicuous “Do Not Sell My Personal Information” hyperlink on their website's homepage
that directs the consumer to a webpage where they may opt-out of the sale of their personal
information. Brokers must also include a description of consumer rights in their online privacy policy
or its California-specific description of a consumer's rights along with a hyperlink to the “Do Not Sell
My Personal Information” webpage. In addition, they must refrain from selling the information of
consumers who have opted out for twelve months before requesting the consumer to authorise the
sale of their personal data and use only the personal information collected from the opt-out form to
comply with the opt-out request.

In July 2021, Privacy Rights identified 540 data operators across the United States, finding them
distributed as 113 focused on people search (21%), 265 doing marketing (49%), and 162 in the
financial, health, and other sectors (30%); 447 included some form of California-specific opt-out
method on their website (83%), and 248 offered an opt-out method for non-California residents (46%).

Privacy Rights classification of data brokers includes those data companies that search for publicly
available data from individuals. In the states where regulations demand it, such data companies must
follow the law regarding the opt-out conditions. In other states, they would not need to deal with
individuals whose information is being collected. When it comes to privacy and the right to stop data
companies from using one's personal information, the United States still has a long way to go.

Electronic copy available at: https://ssrn.com/abstract=4190073


One large category of data brokers is a data marketplace, a player that positions itself as central to
those who need to sell their data, and those who need to buy data is a data marketplace. Datarade, a
new Germany-based player, defines a data marketplace as a “platform where users can buy and sell
data.” It also states that “data marketplaces allow data buyers to browse, compare and purchase data
from multiple sources collected in one, easy-to-navigate marketplace.” [6] Our interest in these players
stems from the fact that individuals can find in those companies' commercial practices points of data
trade for the sale of personal data. In other words, the data collected from individuals are released
from those individuals under a commercial agreement.

In Figure 2, adapted from [3], three roles are highlighted: data owners, data brokers or data
marketplace, and data consumers. Demand for data characterises a data consumer, whilst willingness
to sell one's data describes a data owner. The data broker, or data marketplace, is an intermediary
that bridges the connection between data owners and data consumers. The marketplace procures
personal data, aggregates, integrates, processes, and presents the data to consumers on the other
side.

Figure 2. Data flow and money flow in a data marketplace

The role of a data marketplace as a two-sided market facilitates the "encounter" of buyers and sellers.
Besides its brokerage services, the data marketplace may positively empower individuals and
organisations to monetise the data they produce [6]. They also must work on making it easy for
individuals to navigate the complexity inherent to data trading, in addition to providing a more familiar
experience to users of the platform where browsing for data sets could be as easy as browsing for
other, more familiar items.

Data marketplaces specialised in trades between businesses are known as B2B data marketplaces.
Most data marketplaces are B2B, the preferred conduct for many data buyers to access big data
repositories. Commercial data providers and SaaS vendors find data marketplaces highly convenient.
Buyers are attracted to the platform because of its standard analytics-based service and rich collection
of APIs, and the platform capabilities to help the data seekers source the correct data. Some B2B data
marketplaces are Datarade, Snowflake, AWS, Axon, Eagle Alpha, and Oracle.

One particular B2B data marketplace where data are sourced from sensors, devices and equipment
that generate data constantly has become known as an IoT data marketplace. Access to data in an IoT
data marketplace is typically done on a real-time basis, allowing consumers to access data about
consumer behaviour, online market data, vehicular traffic, industrial processes, and many other types.
IoT-based is richer, bigger and more versatile than other data types. An IoT data marketplace is an
attractive option for technology companies willing or needing to monetise the data coming from their
operations. Two examples of IoT data marketplaces are IOTA Data Market and Streamr.

Data marketplaces have emerged that buy and sell personal data. Such brokers allow an individual to
sell their data on a one-off basis and on a subscribed source that affords the broker a frequent flow of
data for which the individual received a payment. Data brokers in the personal data market make no
assurances about privacy protection. In other words, individuals selling their personal data cannot get

Electronic copy available at: https://ssrn.com/abstract=4190073


any guarantees about whether they can be re-identified among the databases that the broker may
sell to third parties.

Among the data marketplaces, this paper is most interested in are the personal data marketplaces.
The market demand for personal data is robust and continuously growing. However, consumers,
increasingly more aware of the importance of their personal data, are also increasingly dissatisfied
with how social networks, tech firms, and heavy players in the info-com markets deal with the
personal data that individuals disclose to them or trust. A central concern for individuals is that
although companies that collect personal data from members or consumers can monetise the value
of the collected data, individuals remain removed from sharing any benefits. A data marketplace,
instead, may directly pay an individual for their personal data under one of several commercial
arrangements. As access to one's personal communications device is easier and because the user and
the marketplace can easily arrange consented access to personal data on the person's device,
individuals are now in a better position to sell their personal data. The latter means individuals agree
to share personal information with the data marketplace while being paid for their data sharing. As
long as the payment is fair, a personal data marketplace reduces the frictions in the market because
it introduces incentives for individuals to benefit from the release of their personal data. Some
personal data marketplaces are Datum, SynapseAI, and Datawallet.

In 2017, Datum launched an innovative method for individuals to monetise their personal data. As of
2018, Datum claimed 90,000 mobile subscribers to its data storage and selling services. Datum is a
data platform that allows its users to secure and monetise their personal information. Datum's
processing and storing services and users' payments for their information are all achieved through the
DAT or Data Access Token, Datum's native digital currency. Users pay fees for submitting and safely
storing their data sent through Datum's network using its encryption and anonymisation protocols. A
vital aspect of the data transaction is that a user can decide whether or not to share their data when
a potential buyer requests a transaction. The platform is supposed to assure the reputation of buyers
for the sellers' benefit. Data owners get paid for sharing their data in DATs.

5. Personal Data and National Statistics Agencies

The census office or any national statistics agency in charge of the census collects private information
about a country's households and businesses as often as every 10 years in many countries. Individuals
disclose information about themselves across the wide range of data types a census form contains.

In theory, the census office collects information from every individual in the country during the census
period on behalf of the country's institutions. It publishes valuable summaries of the data with a
promise to protect the confidentiality of the responses. In the process of report production and
publication, the census office needs to be accurate, but this can only be achieved at the expense of
the level of privacy. In fact, the more accurate a report is, the higher the privacy loss. The census office
or statistics agency must so allocate the information between two competing uses [2]:

- The production of sufficiently accurate statistics


- The protection of privacy for those individuals whose data have been collected

According to [3], social welfare maximisation can guide a statistical agency to manage the trade-off
above.

Electronic copy available at: https://ssrn.com/abstract=4190073


5.1. Achieving privacy protection

Among privacy protection methods, differential privacy (DP) is increasingly important as major IT
players have extensively used DP. DP is related to both data security and individual privacy. A
differential privacy algorithm assures us of the practical indistinguishability of published statistics,
whether any observation from the data is included or excluded. In other words, an external agent
cannot easily determine, from the published statistics, whether a particular record was a part of the
data or not.

Privacy is an ulterior concern related to the disclosure and utilisation of statistical reports. On the one
hand, statistical reports need to stand against database reconstruction, which is the use of published
statistics to attempt to build a copy of a database on a record-by-record basis. Another threat to
privacy a statistical report must guard against is data re-identification, which links the variables of
records in a database to variables on a publicly accessible database. Anonymisation, also known as
statistical disclosure limitation (SDL), was introduced to protect a database from attacks that may
compromise its privacy. Later, in 2003, Dinur and Nissim published a result known as the database
reconstruction theorem, which unveiled the drawbacks of SDL, one of which is the possibility of
reconstructing a confidential database – up to a small error - if many linearly independent statistics
are published.

An algorithm supports publishing statistics from a database. When the algorithm guarantees the
published results would not vary “much” whether a particular record is included, we are talking about
a differentially private algorithm. Of course, this is clearer when we use the definition of differential
privacy as in section 2. Preliminaries. Differential privacy limits the ability to re-identify individual
records of a database once statistical results have been published. [2] establishes that differentially
private publication algorithms are:

Closed under composition: this implies that when multiple, successive queries are run on the same
database, the cumulative privacy loss can be computed from the privacy loss from each component
query.

Robust to post-processing: This means manipulating the outputs cannot compromise the privacy
guarantees.

Future proof: implying that neither technological innovation nor new data can degrade the quality of
the privacy guarantees.

Public: as usual, both algorithms and parameters can be published without compromising the privacy
guarantee.

5.2. Production interpretation of privacy and accuracy

Following [3], we assume a national statistics agency that uses an 𝜀𝜀-differentially private algorithm to
publish statistical reports; this allows the agency to make a guarantee to protect the data against
privacy loss. Also, the agency's statements must be accurate. [3] defines an associated data
publication mechanism bound by a level of privacy loss, quantified by 𝜀𝜀, and a level of statistical
accuracy (I).

The mechanism is therefore associated with a pair (𝜀𝜀, I). The pair represents the inputs used to
produce the output, that is, public statistical reports, potentially subject to privacy loss due to the
agency’s desire and need to report more accurate results. If Y is the set of all feasible production pairs

Electronic copy available at: https://ssrn.com/abstract=4190073


available to the agency, then, under some mild assumptions, Y can be represented by a transformation
function G(𝜀𝜀, I) such that

𝑌𝑌 = {(𝜀𝜀, 𝐼𝐼)|𝜀𝜀 > 0, 𝐼𝐼 < 0 such that 𝐺𝐺(𝜀𝜀, 𝐼𝐼) ≤ 0}


The agency is interested in those points that sit on the frontier, among those that belong to Y. In other
words, the production frontier PF is

𝑃𝑃𝑃𝑃 = {(𝜀𝜀, 𝐼𝐼)|𝜀𝜀 > 0, 𝐼𝐼 < 0 such that 𝐺𝐺(𝜀𝜀, 𝐼𝐼) = 0}


The agency's challenge is to procure the best technology to implement G. This will also depend on the
particular query workload to be solved.

[3] introduces the agency’s problem: it needs to choose (𝜀𝜀, 𝐼𝐼) that maximises the following social
welfare function

𝑆𝑆𝑆𝑆𝑆𝑆(𝜀𝜀, 𝐼𝐼) = � 𝑣𝑣𝑖𝑖 (𝜀𝜀, 𝐼𝐼)


𝑖𝑖

subject to 𝑃𝑃𝑃𝑃 = {(𝜀𝜀, 𝐼𝐼)|𝜀𝜀 > 0, 𝐼𝐼 < 0 such that 𝐺𝐺(𝜀𝜀, 𝐼𝐼) = 0}

𝑣𝑣𝑖𝑖 (𝜀𝜀, 𝐼𝐼) measure the indirect utility for each person i.

6. A summary of data control, value, access, and privacy protection

The three institutions discussed in the previous sections will help us use the elements of our analytical
framework to illustrate the answers to who controls the data via concrete situations and/or players.
What is their value? Who has got access to data? And how is data privacy protected?

Table 1 displays the keywords associated with questions on its rows and each institution on its
columns.

Members of a social network can upload pictures and videos but cannot control the use of the data
that is collected when they either respond to targeted ads or explore other members' pages. Data
marketplaces do not commit to allowing users to control their data. One exception is Datum, a data
company that uses a smart contract blockchain, allowing users to control their data.

Suppose data control seems unambiguous depending on the context within which data are generated.
In that case, data valuation may be characterised by ambiguity or at least a wide range of options that
may render its determination difficult. [8] discusses three approaches to data valuation: a. valuation
of personal data by the shareholder; b. calculation of the Customer Lifetime Value, which is based on
the net present value of the revenues the customer will generate for the company in the future; and,
c. valuation that the individual user or customer does of their data. When a price needs to be
determined, as, in the data marketplace case, the three signals provided by these three approaches
may conflict. [8] estimates that the value of personal data to a user varies from $15 to $40, based on
the acquisition value of Whatsapp and Instagram by Facebook back in 2012. In our work [4], where
we conduct laboratory experiments on a simulated data marketplace, participants are invited to sell
their private data to the market. We consider seven types of private data: personal finance, health,
citizenship and civil life, professional and career, food consumption, technology consumption, and
leisure services. Two trading scenarios were designed: one for procuring single data bits and the other
for procuring data bundles. Determination of data prices for those data types that lab subjects rank
prior as the highest value is reached via second-price auctions. We learned that, in general,
participants attach nontrivial value to their private data. When answering a single question, people
value a single data bit at about $0.3. and, when answering a bundle of questions, the average value of
a single question in a data bundle is about $0.6.

Electronic copy available at: https://ssrn.com/abstract=4190073


Social Networks Data Marketplaces National Statistics Agency
Social network controls its Datum uses a smart contract Citizens are promised their data
Control members' data. blockchain allowing users to will not be mishandled.
control their data.

Proxy:
Value (price) Revenue/# members Find prices from websites. Social Welfare function for
production pair (𝜀𝜀, I)

Advertisers
Access Members have access to Raw data
friends’ info Query

Not clear in general


Privacy (Practically) no protection Some provide anonymisation US Census Bureau uses DP.
guarantees Others already use DP
Protection

Table 1. Data control, value, access and privacy protection: a summary.

When it comes to accessing data, social media have full access to its members’ personal data. Be it
the collection of posts, pictures, videos, and other kinds that members upload, be it the behavioural
data the owner of social media collects when users respond to targeted ads and other-directed
messages. In contrast, at least one data marketplace, Datum, implements blockchain technology to
allow its members to be certain about securing their data and decide who has access to their data. In
other words, the decision is in the users' hands. This is significantly different from the typical way data
marketplaces handle access to data.

As s general practice, social media make no commitments about privacy protection to its users. Some
of the most prominent privacy breaches have involved some social media outlets, and their privacy
protection record is not precisely strong. Their current membership terms and conditions do not spell
out explicit protections regarding the anonymisation of the members' identities.

Electronic copy available at: https://ssrn.com/abstract=4190073


References
[1] Economides, Nicholas, and Ioannis Lianos. 2021. ‘Restrictions on Privacy and Exploitation in The Digital Economy: A
Market Failure Perspective.’ Journal of Competition Law and Economics. 17 (4): 765 – 847.

[2] Abowd, John M., and Ian M. Schmutte. 2019. ‘An Economic Analysis of Privacy Protection and Statistical Accuracy as Social
Choices.’ American Economic Review, 109 (1): 171-202.DOI: 10.1257/aer.20170627

[3] Gunesekara, Gehan, Fernando Beltrán, and Mengxiao Zhang. 2022 ‘Monetizing Behavioural Surplus by Individuals:
Alternative Legal Approaches.' New Zealand Business Law Quarterly.

[4] Zhang, Mengxiao, and Fernando Beltrán. 2022. ‘An Experimental Study on Discovering the Value of Private Data in Data
Marketplaces.’ PACIS 2022 Proceedings 170. https://aisel.aisnet.org/pacis2022/170

[5] Dwork, Cynthia. 2006. 'Differential Privacy. In Proceedings of the International Colloquium on Automata, Languages and
Programming (ICALP). 1-12.

[6] Datarade. 2021. https://about.datarade.ai/data-marketplaces

[7] Snapchat. 2021. https://www.businessmodelzoo.com/exemplars/snapchat/

[8] Glikman, Pauline and Nicolas Glady. 2018. ‘What’s The Value Of Your Data?’ https://techcrunch.com/2015/10/13/whats-
the-value-of-your-data/?guccounter=1

Electronic copy available at: https://ssrn.com/abstract=4190073

You might also like