Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

A PEER-TO-PEER

OWNERSHIP-PRESERVING DATA
MARKETPLACE
Nicolas Serrano and Fredy Cuenca

Yachay Tech University


nicolas.serrano@yachaytech.edu.ec, fredycuenca@yachaytech.edu.ec

ABSTRACT
A data marketplace enables trading among those who expect to monetize their data and those in-
terested in gaining insights from it. Unfortunately, the paradigm that drives current marketplaces
suffers from data leakage: one who buys data can, in principle, resell the acquired data as many
times as he wants, even despising non-disclosure agreements. This work proposes a peer-to-peer
ownership-preserving data marketplace, which allows to sell data that can be computed, though not
unveiled. First, an owner provides encrypted data to a buyer, who can perform arbitrary operations
on this encrypted data as if it were regular data. Thanks to homomorphic encryption, the encrypted
results obtained in the buyer-side can then be decrypted in the seller-side, in a second and definitive
data exchange.

KEYWORDS
Data Marketplace, Homomorphic Encryption, Blockchain

1 INTRODUCTION
In today’s digital economy, collecting data from multiple sources can create enormous po-
tential for companies able to generate information from raw data. This has fostered the
expansion of data marketplaces, online transactional stores that facilitate data trading [1].
In exchange for a payment, it is possible to buy and sell advertising, demographics, public
health, business intelligence and sensor data by connecting to a data marketplace, be it
through a graphical or backend interface.
Those who buy data normally want it to feed a wide assortment of statistical models,
such as those from business intelligence, data analytics, machine learning and more. Those
who sell data may be individuals who want to take advantage of the data generated by
their mobile phones, smartwatches, biomedical sensors and other IoT gadgets; but most of
the times, these are companies that want to monetize the content of their databases and
use the proceeds to continue growing.
Though data marketplaces are leveraging the creation of new business models and the
reinvention of many existing ones, they face one important issue, the problem of data leak-
age. One who buys data normally ends up with a copy of the raw data, which he can resell
to other parties as many times as he wants, without giving any contribution to the rightful
data owner, sometimes even despising non-disclosure agreements.
Some may argue that since the buyer has paid for the data, he automatically becomes
the owner and can resell it to his own benefit to whoever he wants. But, in our viewpoint,
we think that reselling the data acquired in a marketplace is in clear detriment of the data
seller. We strongly believe that, just as copyright ownership confers the author exclusive
right to use his work, thus preventing others from commercializing it without the author’s
consent, data owners should also have exclusive right on the data they struggle to collect,
clean and store.
We decided to create a data marketplace that protects the ownership of sellers over
their data. Within the taxonomy of marketplaces [2], our proposal qualifies as a seller-side
data marketplace, since it has to be deployed by the seller, the party most desirous of
protecting the data.
The proposed marketplace allows selling data that can be computed, though not un-
veiled. The buyer can acquire encrypted data and perform arbitrary computations with
them, as if they were regular data. The obtained encrypted result can then be decrypted
in the seller-side. This way, the proposed tool is ownership-preserving: the buyer will only
see the final result, the information generated from the data, but never the actual data.
The contributions from this implementation are:

1. The data owner preserves the ownership rights of her data while monetizing it.
2. The buyer computation is private because it happens on his local computer.
3. There are no intermediaries involved in the process that would take a percentage cut.

Homomorphic encryption and distributed ledgers are the main technologies underlying
the proposed marketplace. The former involves a mathematical property that allows us
to preserve the results of operations made with encrypted numbers; the latter permits to
store all the transactional information in a network of independent computers, thus making
payment disputes more objective and transparent.
The remainder of the paper is as follows. Section 2 describes the existing data mar-
ketplaces model. Section 3 discusses other similar data marketplaces in order to establish
the contribution of the present work. Section 4 provides an overview of homomorphic
encryption and decentralized ledgers. Section 5 describes the technicalities involved in
the implementation of the proposed data marketplace. Then, Section 6 present a case
study that was used to test the feasibility of our proposal. At the end, a discussion of the
strengths and weakness precedes the conclusions of our work.

2 EXISTING DATA MARKETPLACES


Data is the new oil that companies, organizations and individuals are trying to monetize.
It is because data can help to take better decisions, perform statistical analysis, build
machine learning models and more. A formal and automated way to do monetize data is
by using data marketplaces to sell the data [1]. These tools enable one stakeholder to sell
data and other one to buy it [2]. The seller profits from the data while the buyer gains
access to specific data sets for his particular application.
A data trading process has two main objectives: to help a data owner profit from her
data and to give a data buyer access to data in order to process it. Most of the times
the data buyer is looking for specific information that a particular data set can give him,
therefore he is not interested in the data itself but in the information he can extract from
it [3].
In the data trading process, there are the following disadvantages that could discourage
either of the stakeholders to participate:

• After the transaction is completed, the buyer could keep a plain data set copy. A
malicious buyer could resell it without giving any credit or contribution to the rightful
owner. There are some seller-side data marketplaces that allow buyers to process data
on the seller’s end and only obtain the final computed result, but these solutions can
require a more sophisticated and expensive infrastructure [2]. This non-authorized
reselling causes a data leakage issue of the overall data set that infringes the owner-
ship rights of the data provider.

• Consortium data marketplaces, consisting of an intermediary platform that seller and


buyers, can be used to transact data without worrying about complex infrastructure
and architectures. Even though these solutions guarantee a smooth transaction, they
take a percentage cut from every transaction and the non-authorized reselling trust
issue is passed from the buyer to the intermediary [2].

Just as copyright ownership give authors exclusive right to their work, a data market-
place should give data owners exclusive rights to the data they struggle to collect, clean
and store [4]. To accomplish this, an ownership-preserving data marketplace is proposed
in this work.

3 RELATED WORKS
Below we contrast our proposal with other similar data marketplaces.
Sterling: A Privacy-Preserving Data Marketplace. This data marketplace is
based on a privacy-preserving blockchain that uses independent Trusted Execution En-
vironments (TEEs) nodes to perform the computations over private data. The system
allows data providers to specify and automatically enforce terms of use on their data be-
fore uploading it to the marketplace, using smart contracts. Unlike our proposal, the
privacy-preserving feature is accomplished by the use of TEEs hardware; this ensures that
computation and data are not revealed [5]. Our proposal implements homomorphic encryp-
tion to hide the plain data without requiring the users to have additional TEE hardware.
A Decentralized Data Marketplace for Smart Cities. It consists on a blockchain-
based marketplace. The data information (description, type of data, seller identification,
price, IP address, etc) is registered on the blockchain and the data itself is keep at the
seller’s end. A streaming data protocol is executed when payment is performed and data
is sent using end-to-end encryption. To avoid buyers from misusing data (e.g. selling it
to other parties), a buyer rating system is implemented [6], so buyers would tend to keep
their reputation clean in order to be considered again. By using homomorphic encryption
in our implementation, we do not need to trust neither the buyers’ good faith nor the
rating system to preserve the data ownership rights of the seller.
Privacy-Preserving Decentralized Data Sharing System. It proposes a data
marketplace solution using blockchain technology to make it decentralized, and TEEs to
make it privacy-preserving. This implementation share some similarities with the Sterling
marketplace but differs from it by implementing the blockchain logic in the Ethereum plat-
form rather than the Oasis platform used in Sterling. Oasis allows for private blockchain
storage while Ethereum makes it public. The authors argue that the public feature of
Ethereum makes the content more auditable [7]. The main difference with our proposed
solution is the technology behind the ownership-preserving feature: this work uses TEEs
and our proposal uses homomorphic encryption.
Blockchain Enabled Data Marketplace. This data marketplace involves an activ-
ity log registered in a blockchain to identify and audit transactions. There are three main
participants in this scenario: the seller, the aggregator and the buyer. The seller encrypts
the data and sends it to the aggregator who accumulates data from multiple sellers. When
the buyer pays for the data, the seller sends the decryption key to the buyer. The buyer
then request the data to the aggregator and checks if requirements match before using
it. The authors state that homomorphic encryption could be used for this checking [8].
Our work uses homomorphic encryption to hide the data from the buyer but allow him to
operate over it as if it was plain data. Our proposed solution only involves two participants
(the buyer and the seller).
Ownership preserving AI Marketplace. This work presents an approach which
involves training AI models using encrypted data and federated learning. It has three
stakeholders: data owners, cloud owners and model owners. The model owners split the
training model process into multiple cloud owners (a.k.a. federated learning). The cloud
owners store the encrypted data from the data owners and perform the training process
incentivized by a part of the payment. By not requiring multiple cloud owners to perform
federated learning, our proposal keep the process peer-to-peer and not make the data own-
ers share their profits [9]. It is worth pointing out that the federated learning approach
allows the buyer to buy data from multiple sellers and use it to train models in the same
process. Our proposed solution is limited to a peer-to-peer two parties exchange.
An Efficient and Secure Data Sharing Framework. This work can be used as a
foundation for building a data marketplace. It focuses on storing data in a semi-trusted
cloud and authorizing specific users to access it by providing them unique tokens. The
data stored in the cloud is encrypted and by using additive homomorphic encryption, the
authors allow the end user to decrypt it by passing a cloud-side first decryption and us-
ing the given token for a second decryption. This work uses homomorphic encryption to
manage authorization accesses for the end users but these users can look at the plain data
[10]. In contrast, our work uses homomorphic encryption to allow end users to compute
over encrypted data and only then obtain the plain result.
The contribution of our work is the implementation of a peer-to-peer, ownership-
preserving marketplace by combining blockchain technology and homomorphic encryption.

4 PRELIMINARIES
The following subsections describe the two main technologies used to build the proposed
data marketplace.

4.1 HOMOMORPHIC ENCRYPTION


This encryption concept was initially proposed by Rivest et al. in 1978 [11] and since
then, the field has been evolving. Nowadays there are three main types of homomorphic
encryption schemes: partial homomorphic encryption, somewhat fully homomorphic en-
cryption (also called leveled homomorphic encryption) and fully homomorphic encryption
[12]. Each of them depends on the amount of operations allowed (⊕0 , ⊗0 , etc) and the
number of times that an operation can be perform before corrupting the internal value of
the encrypted element.
Homomorphic encryption allows the users to operate over encrypted data by taking
advantage of a homomorphism between two mathematical rings. It can be defined as:

Let A be the ring of the plain elements


Let B be the ring of the encrypted elements
An homomorphism between A and B exists if and only if
there is a function σ such that
σ:A→B
σ(a1 ⊕ a2 ) = σ(a1 ) ⊕0 σ(a2 )
σ(a1 ⊗ a2 ) = σ(a1 ) ⊗0 σ(a2 )
σ −1 (σ(a1 )) = a1
where a1 , a2 ∈ A, and σ(a1 ), σ(a2 ) ∈ B

σ is the encryption function


σ −1 is the decryption function
⊕0 , ⊗0 are predefined operations to compute encrypted elements

A particular scheme implementation had to be selected from the multiple homomor-


phic encryption schemes in order to build a data marketplace that would allow the data
owner to encrypt her data before sending it to the buyer and preserve her data ownership
rights. In this work, the Homomorphic Encryption for Arithmetic of Approximate Num-
bers (HEAAN) was the selected candidate.
The HEAAN method, a leveled homomorphic encryption scheme, allows to perform
addition, multiplication and even efficient function evaluations (such as multiplicative in-
verse, exponential function, logistic function and discrete Fourier transform) [13]. The
method consists on:

1. Taking a complex vector a = (a1 , a2 , ..., an ).


2. Encoding a into a polynomial b = b0 + b1 x1 + ... + bn−1 xn−1 .
3. Encrypting b as a polynomial b0 .
4. Getting a result polynomial r0 after computations using b0 .
5. Decrypting the polynomial r0 to r.
6. Decoding r into a result complex vector c.

It is worth mentioning that HEEAN uses the ring learning with error problem as the
hard assumption. This implies that a small error is added in each element to encrypt
it properly and makes the scheme to work with arithmetic of approximate numbers [13].
Therefore, any tool using this scheme should be considered for tasks that do not require
precise results (ex: statistics, machine learning, etc).
Overall, most of data marketplaces solutions using homomorphic encryption (including
HEEAN) have the following steps in one way or another [14]:
1. The data owner encrypts a data set.
2. The data owner sends the encrypted data set to the buyer.
3. The buyer performs his computation with the data set.
4. The buyer sends the computation result for decryption.
5. The data owner decrypts the result.
6. The data owner sends the decrypted result to the buyer.

When talking about a for-profit data transaction. The payment could occur after the
1st step or after the 5th step. Depending on the payment step position, the data owner
and the buyer would be in critical advantage positions. If the buyer pays after the 1st
step, the data owner could stop the process and not send neither the encrypted data set
(2nd step) nor the decrypted result (6th step). If the payment step is after that the data
owner sends the decrypted result then the buyer could simply receive the result and not pay.
Furthermore, the process could intentionally or unintentionally fail in any of the mul-
tiple steps leaving the data owner and the buyer in dispute. A solution to this problem
could be an independent observer which records every step of the transaction and is able
to act as a mediator when something fails. That is why the distributed ledgers technology,
based on blockchain, can be use as a practical solution to the problem.

4.2 DISTRIBUTED LEGDERS


A distributed ledger can be seen as a virtual spreadsheet that tracks the interaction state
of the data trading process, e.g. whether the buyer already payed for data, whether the
seller already send the data, etc. Such a spreadsheet is not in the buyer or seller side, but
across a decentralized network of independent nodes, on a blockchain [15]. If one node
has corrupted information, it can be validated by other nodes in the network and the
wrong information would be rectified. When new information needs to be registered in the
ledger, it has to be validated with the previous registers and be linked into the spreadsheet
through a clever use of hashes. Each new block of information is linked to the previous one
by registering the previous block hash into the new block information. Hence, the name
block chain [16]. If in the future, one previous block of information is changed, the hash
of it and therefore the content of the next block would change as well making the nodes
notice the corrupted information.
By setting a decentralized ledger on a blockchain, it acts as an independent interme-
diary between the buyer and the seller. In case of disputes from a failed step during the
trading, the ledger can be used to deliberate and then come up with a proper response,
like returning the money to the buyer, for instance.
The Ethereum platform allows developers to deploy applications on its blockchain by
conveniently abstracting all the inner workings of the blockchain technology and letting
developers focus on their particular application functionality. It incentives nodes to join
and maintain the network by rewarding them with cryptocurrency (Ether) for the energy
consumption and resources invested [17].
Applications deployed in Ethereum are called smart contracts because they act as a
contract that follows the developer-predefined set of rules, totally independent from its
users. Each smart contract has its own cryptocurrency wallet address that let it receive
and send cryptocurrency. This address also behaves as an unique identifier for the smart
contract. The developer-predefined set of rules are functions, written by the developer,
that tell the smart contract to act in a specific way when they get called [18].
A programmer interested in developing a smart contract needs to write the code in any
of the smart-contract programming languages (e.g. Solidity, Vyper) and compile it [18].
The compilation process outputs two files: the smart contract byte code and an Appli-
cation Binary Interface (ABI) file. The byte code is the file with machine code that gets
deployed to the Ethereum blockchain and the ABI acts as a dictionary for users to interact
with that specific smart contract. It indicates which functions are callable and their inputs
parameters [17].
By using a decentralized ledger (aka smart contract) to register each step of the ho-
momorphic encryption process and simultaneously act as an independent intermediary
between the two parties involved, the data marketplace can be considered peer-to-peer
(between only two parties). In case of disputes from a failed step in the process, the ledger
can execute specific functions to return the money to the buyer or send it to data owner.

5 DATA MARKETPLACE IMPLEMENTATION


5.1 SOLUTION ARCHITECTURE
The proposed peer-to-peer ownership-preserving data marketplace has a client-server archi-
tecture that in order to explain it properly let’s consider two parties that want to exchange
data using our solution: Alice, the data seller and Bob, the data buyer.
Alice has to set up a web service application to serve her data and the necessary func-
tionalities through APIs. The web application would be in charge of fetching the plain
data from a database, encrypt it and decrypt the result. Additionally, the web application
would deploy a smart contract that would contain the data APIs, the data price and that
would register the transactions between Alice and all the other possible buyers.
On the other side, Bob wants to process data to get a specific result. He knows that
Alice is selling data through the proposed solution so he visits her website to find the
correct API routes. After that he has to build a python program to fetch the encrypted
data using Alice’s specific API, process it in his local computer and send the result back
to Alice for decryption. The proposed solution would provide a python library with all
the required functions so Bob does not have to worry about the underlying functioning of
it. These functions not only interact with the seller application but also with the smart
contract to register each step.
Any data transaction process between Alice and Bob would have the following steps:
1. Bob creates a program to perform his computation with encrypted data
2. Bob loads his wallet credentials to the program
3. Bob pays the data price to the smart contract
4. Bob asks the encrypted data to Alice
5. Alice checks if the smart contract has received payment
6. Alice sends the encrypted data to Bob
7. Bob performs the computation in his local machine
8. Bob masks the result by adding a random value only know to him
9. Bob sends the result to Alice for decryption
10. Bob notifies the smart contract he requested a decryption
11. Alice checks the result meets predefined security parameters
12. Alice decrypts the results and send it to Bob
13. Alice notifies the smart contract she sent the decrypted result
14. Bob receives the result
15. Bob notifies the smart contract he received the result
16. Bob unmasks the result by substracting the random value
17. The smart contract pays Alice the data price
18. The smart contract resets its registers for another Alice-Bob transaction
Bob ends up with his computation results and Alice has successfully monetize her data.
Bob’s computation were performed in his local computer and even though Alice could see
the form of the decrypted result, the masking prevented her from getting the final result.
The data ownership rights has been preserved and Alice can rest assure that Bob would
not be able to resell her data without her consent.

5.2 HOMOMORPHIC ENCRYPTION IMPLEMENTATION


Microsoft released the Simple Encrypted Arithmetic Library (SEAL) in 2018. This C++
library contains an implementation of the HEEAN scheme [19]. OpenMined, an open-
source community whose objective is to promote privacy-preserving AI technologies, cre-
ated a Python wrapper (called TenSEAL1 ) for the SEAL library to make homomorphic
encryption even more accessible [20].
TenSEAL is the library used in this work for all homomorphic encryption operations
including data encryption, operations with encrypted values and data decryption.

5.3 DISTRIBUTED LEDGER IMPLEMENTATION


The Vyper language was used to write the decentralized ledger code. It is a contract
oriented language that targets the Ethereum Virtual Machine (EVM). It mitigates the
security vulnerabilities presented in Solidity, the most popular programming language for
decentralized ledgers development [21] and gives an easy pythonic way to write smart con-
tracts.
The web3 library2 was used in order to interact with the Ethereum blockchain from
both the seller-side and the buyer-side. It is a Python API that abstracts all the commu-
nication steps involve between a computer and a blockchain node.
Since communication within the Ethereum network is restricted to participating nodes,
we used Infura’s services. Infura3 is a company that lends their nodes in order to facili-
tate the blockchain communication. The company has a freemium model that gives away
100,000 requests per day and allows 3 projects from one account.
Considering an easy-to-set-up process for both seller and buyer, the proposed data mar-
ketplace uses Infura as the default gateway to communicate with the blockchain.

5.4 SELLER-SIDE WEB APPLICATION


To build the seller-side web application, the Flask framework was used [22]. Routes for
the following actions were defined:
• Serve the decentralized ledger details (address and ABI).
• Serve the encrypted data set.
• Serve the decrypted result.
• Serve a template tutorial HTML page.
1 https://github.com/OpenMined/TenSEAL
2 https://github.com/ethereum/web3.py
3 https://infura.io/
5.5 BUYER-SIDE API LIBRARY
The buyer-side API library is written in Python by using the TenSEAL and the Web3
libraries. It allows users to:
• Create an user object with payment and gateway information.
• Request the smart contract details.
• Retrieve the encrypted data set.
• Send the decrypted result.
• Start a process failure dispute.

5.6 AVOID PLAIN DATA LEAKAGE


In step 11 of the transaction process, Alice checks that the result meets some predefined
security parameters before decryption. This validation tries to solve the scenario where
the seller is tricked to reveal the plain data when decrypting the result. For example, Bob
could:
• sent the whole encrypted data set to decrypt and get the plain data set
• add or multiply by a random value the whole encrypted data, get the decrypted data
and inverse the operation to obtain the plain data set
Our proposed solution prevents this from happening by taking into consideration an
economic perspective. The following checking mechanism is implemented to check that the
result meets the security parameters:
1. Let D be a m × n matrix containing the plain original data
2. Let R be the encrypted result matrix with dimensions j × k
3. Let s be a secure parameter positive integer defined by the data owner
4. Let r = m
s
5. If j < r and k < r, decrypt the result. Otherwise, notify the user that the results
dimensions are not valid.
This mechanism ensures that the result is always smaller than the original data de-
pending on two factors: the number of rows in the plain data and the secure parameter
previously defined by the data owner or the application developer.
It is true that the buyer could request decryption multiple times and build back the
plain original data. But each decryption request would need to be pay so if the buyer sends
n requests then he would have to pay n times. The transaction price rises considerably and
this will constrain the buyer to do multiple decryption requests to get the whole decrypted
data set.
The larger the data set to encrypt, the more secure the checking mechanism is. For
example, assume an buyer wants to train a machine learning model with 7 attributes with
an encrypted data set of 100 × 8 dimensions. 100 is the number of rows in the data set
and 8 is the number of columns. The model would take a 100 × 7 matrix and compare its
output with the 100 × 1 target column. At the end of the process, the buyer would need
the weights and the bias decrypted (at least a (7 + 1) × 1 matrix).
If s = 1, the result can have dimensions as large as b 100
1 c = 100. Here, the buyer could
send the data 100 × 8 as the result and pass the checking mechanism. As s becomes larger,
the result dimensions have to be smaller and the seller can rest assure that the buyer would
need more than one decryption to retrieve the whole plain data. The secure parameter
has to be at most s = b 100
8 c = 12 so that the result passes the checking mechanism in this
particular case. So s ∈ [1, 12] allows the buyer to decrypt his result but the larger s the
more flexible for him. The problem is that the seller does not know the result dimension r
(in this case 8); she can only deal with m and s at her end. She has to set a secure s and
to set it large enough, the seller would need a considerably high m. Therefore the larger
the data set, the more secure the checking mechanism is.
6 STUDY CASE
In order to test the proposed data marketplace, we had to implement both the seller and
the buyer side.
The seller side consists of a Postgres database and a web application. The former con-
tains the data to be sold; the latter, the API needed by the buyer to ask for data. The
buyer side consists of a Python script that uses PyTorch for training a regression model.
The decentralized ledger was deployed on the Ropsten network, an Ethereum test net-
work that simulates a production environment with fake cryptocurrency. Two fictitious
addresses were used to simulate a seller and a buyer.

6.1 LOGISTIC REGRESSION MODEL TRAINING


The proposed marketplace was used to satisfy a buyer interested in training a machine
learning model, with which he expects to predict the lifetime of a specific type of business.
The model input variables include the business size (measured by the type of tax contrib-
utor), distance to downtown and competitors influence (measured by the Voronoi polygon
area of the business location [23]). It is represented as a matrix M with k × n dimensions,
where k is the number of records purchased and n is the number of input variables. All
this is implemented as a Python back-end application.
"m11 m12 ... m1n #
m21 m22 ... m2n
M= .. .. .. ..
. . . .
mk1 mk2 ... mkn

On the other hand, the data seller owns thousands of records of businesses, split among
different tables of a relational Postgres database.
The data trading starts when the buyer uses the API to ask for data to feed his model.
After accessing the decentralized ledger to check for the payment, the seller replies by send-
ing the required data. This data is not human intelligible; it is represented as a matrix of
encrypted values. Each column represents one input variable; each row, a data instance.
Although the data is encrypted, it can still be used to train the model. Encrypted
numbers can be combined in arithmetic operations with regular numbers. The outcome of
such operations is an encrypted result, which has to be decrypted in the data owner side,
after a second and definitive API request.
The training process performed by the program at the buyer side is as follows:

1. Split M into a m×(n−1) matrix M 0 and a m-sized vector t that represents the target
column. In this case, t represents the lifetime (in years) of every business encoded
in M . The buyer-side is able to ask the seller-side to send M in this specific format,
M = [M 0 |t].
m ... m1n−1  "m1n #
11
m21 ... m2n−1 m2n
0
M = .. .. ..  and t = ..
. . . .

mk1 ... mkn−1 mkn

2. Take the logistic regression model LRM , which initially contains:


(a) (n − 1) randomly selected weights w = (w1 , w2 , ..., wn−1 ).
(b) a bias value b.
(c) a weight correction parameter δw = 0.
(d) a bias correction parameter δb = 0.
(e) a sigmoid function sig.

LRM = sig(w1 mi1 + w2 mi2 + ... + wn−1 min−1 + b)


3. Input each row (mi1 , ..., min−1 ) from the matrix M 0 into LRM to obtain a result yi :

yi = sig(w1 mi1 + w2 mi2 + ... + wn−1 min−1 + b)

4. Subtract yi − min , where min is in t, multiply it by the row (mi1 , ..., min−1 ) and add
it to δw.
5. Subtract yi − min and add it to δb.
6. Repeat steps 4, 5 and 6 for each row in M 0 .
7. Update the parameters for the next iteration by doing:
1
w = w + δw( )
k
1
b = b + δb( )
k

At the end, we end up with an array of encrypted weights w = (w1 , w2 , ..., wn−1 ) and
an encrypted bias b, which must be concatenated, v = (w1 , w2 , ..., wn−1 , b), before being
sent to the seller side for decryption.
The decryption must necessarily be performed by the data seller, since this is the only
one who has the decryption key. Before sending the encrypted result, the data buyer mask
the result by adding a random value to it. This value is only known to him and it would
prevent the data seller from getting the plain result computed by the buyer. After de-
crypting the encrypted result, this is sent back to the data buyer, who can finally use his
model for future predictions. At that moment, the trading is finished.
The data buyer never saw the data stored in the seller side; this way the marketplace
guarantees the data owner exclusive rights for her data. The data owner never knew that
she was decrypting a logistic model; therefore the design effort of the buyer is kept private
as well.
The data marketplace implementation can be found at the author’s repository: https://github.com/NicoSe
This implementation is a prototype and should not be used in production environments yet.

7 RESULTS
Arithmetic operations involving encrypted data are slower than those involving regular
data; this is because there are more operations underlying an encrypted operation that
are abstracted through the use of the TenSEAL library. In our study case, training an
encrypted model took us around 133.63 seconds while the same model using plain data
only took us 0.68.
Despite this notorious difference, the accuracy obtained with the encrypted model is
very similar to the one obtained with the plain model, less than 1%, thus showing that
both models had similar performances.
We compared the features of the previously described related works with our proposed
solution in Table ??. The proposed data marketplace achieves the four features on the
table: the buyer cannot sell data without consent from the seller, the buyer computation
is private, there are no intermediaries involved and there is no need to have TEEs or ad-
ditional specialized hardware.
Table ?? shows data set sizes and the total payment amount if the buyer wants to get
the original data set in the particular case where the data has eight columns and each
decryption costs $ 25. It is shown that as the number of rows m in the data set grows, the
seller can set up larger values for s and therefore the total payment increases.
Solution Buyer can- Buyer com- No inter- No TEEs
not sell putation is mediaries required
data private
A Peer-to-Peer X X X X
Ownership-Preserving Data
Marketplace
Sterling: A Privacy-Preserving X X X
Data Marketplace
A Decentralized Data X X X
Marketplace for Smart Cities
Privacy-Preserving X X X
Decentralized Data Sharing
System
Blockchain Enabled Data X X X
Marketplace
Ownership preserving AI X X
Marketplace
An Efficient and Secure Data X X
Sharing Framework

Table 1: Feature comparison between data marketplaces

m (rows) n s r (m
s ) # of re- total to
(columns) quests pay ($)
100 8 8 12 6 $150
100 8 10 10 8 $200
100 8 12 8 13 $325
200 8 18 11 14 $350
200 8 21 9 20 $500
200 8 24 8 25 $625
400 8 38 10 32 $800
400 8 44 9 40 $1,00
400 8 50 8 50 $1,250

Table 2: Data sizes and the total payment to get the plain data
8 DISCUSSION
The successfully implementation of the study case described in the previous section shows
that a peer-to-peer ownership-preserving data marketplace is possible. Below we discuss
some assumptions made in order to the data marketplace work correctly, the advantages
and disadvantages of our proposal.
There are two main assumptions in this work. The first one assumes that one seller
has enough valuable data for the buyer to use (ex: a company which has collected data);
and that the buyer would not want to mix data from different, and therefore independent,
data sellers. In cases where the buyer needs data from multiple sellers, a different solution
involving Multi Party Computation (MPC) could be use [24].
The second assumption is that all the transaction happens in a honest-but-curious seller
scenario where the seller would behave honestly and the buyer can be sure that the received
result is correct. Future works would be focused crucially on solving the challenge of the
correct result guarantee for the buyer.
One disadvantage is the precision of the final result. Working with encrypted data in-
troduces small errors for every operation that is performed [14]. Therefore, the decryption
of an encrypted result will be slightly different as if we would have worked with regular
values. This is why we consider that the use of our proposal is limited to randomized,
error-tolerant models. This disadvantage is due to the HEEAN encryption method of the
OpenMined library. But this might be alleviated by using other libraries with different
encryption methods. As far as we know, Microsoft is improving the SEAL library [19] and
Intel is developing a more efficient homomorphic encryption scheme [25].
There is also a lack of prior data visualization that prevents the buyer to explore it
before creating his program. Data visualization is really important to find distributions,
outliers and other features for statistical analysis and machine learning [26].
Another disadvantage is that our proposal involves a learning curve on the side of the
buyer. He must know how to use the TenSEAL library to exploit homomorphic encryption.
HEEAN, a leveled homomorphic encryption scheme, allows for addition and multiplication
over encrypted data. In our study case, the sigmoid function evaluation is accomplished by
approximating the function to a polynomial in a specific interval [27]. Then the polynomial
is evaluated on particular encrypted values. This evaluation is nothing more that coeffi-
cient multiplications and additions between terms p(x) = c0 + c1 x1 + c2 x2 ... + cn−1 xn−1 .
There are some works trying to facilitate homomorphic encryption adoption such as the
Armadillo compiler [28].
One of the attractive features of the proposed marketplace is that it allows separation
of data and computation, thus imposing a barrier that prevents both the seller and the
buyer to take advantage of the other party. As seen in the logistic regression model, the
data buyer was never able to see the actual data, much less to sell it to his own benefit.
This prevents data leakage and guarantees the seller full ownership of her data. It is worth
mentioning that our work does not guarantee data privacy itself but ownership right over
the data. There are multiple ways and current research to show that by looking even to
a small part of plain data, a malicious buyer could inference key insights of the data set
and break data privacy [29][30]. Our work partially guarantees the seller ownership rights
that her data would not be reselled without her consent. Furthermore, the data seller
never knew that she was decrypting a logistic regression model; otherwise, she could steal
it and use to her own gain. But our proposal prevents software piracy, which is in the best
interest of the data seller.
Another advantage of the proposed solution is that now, the buyer has much more
access to data sets for his research and development of innovative tools like AI models.
Companies and other stakeholders can safely profit from their collected data while being
sure it would not be reselled without their consent. College students and startups can have
access to previously not available data to perform their own research.
According to the current legal and regulatory frameworks, copyright law does not ap-
ply to data. Only in very special situations, a database can be copyrighted as long as
the data owner has proven that her data meets certain characteristics [31]. But, by using
data-ownership marketplaces, one can easily obtain, by means of technology, what might
be complicated by means of legal actions: to confer the data owner exclusive right on her
data, just as copyright ownership confers the author exclusive right to use his work.
Last but not least, the homomorphic encryption library uses a leveled homomorphic
encryption scheme which has a limit on the number of operations that can be performed.
After the bootstrapping technique proposed by Gentry in 2009 [32], multiple fully homo-
morphic encryption implementations have been released and they could be implemented
in future versions of the data marketplace.

9 CONCLUSIONS
This work shows how to implement and use a peer-to-peer ownership-preserving data mar-
ketplace. Through the use of homomorphic encryption and decentralized ledgers, it is
possible to sell data that can be computed, though not unveiled. The operations on en-
crypted data lead to an encrypted result, which, when decrypted, it is (almost) the same
as if the same operations were performed over regular data.
The proposed marketplace is implemented and tested by training a logistic regression
model to predict the lifetime of a business given a set of attributes. Technical aspects,
future challenges and legal implications of the proposed solution are discussed.

9.1 CONTRIBUTIONS
To summarize, the proposed data marketplace has the following contributions to the field:

1. The data owner preserves the ownership rights of her data while monetizing it.
2. The buyer computation is private because it happens on his local computer.
3. There are no intermediaries involved in the process that would take a percentage cut.

References
[1] H. Richter and P. R. Slowinski, “The data sharing economy: On the emergence of new inter-
mediaries,” IIC-International Review of Intellectual Property and Competition Law, vol. 50,
no. 1, pp. 4–29, 2019.
[2] F. Stahl, F. Schomm, G. Vossen, and L. Vomfell, “A classification framework for data mar-
ketplaces,” Vietnam Journal of Computer Science, vol. 3, no. 3, pp. 137–143, 2016.
[3] J. Rowley, “The wisdom hierarchy: Representations of the dikw hierarchy,” Journal of Infor-
mation Science, vol. 33, no. 2, pp. 163–180, 2007, https://doi.org/10.1177/0165551506070706.
[4] Y. L. Min Chen Shiwen Mao, “Big data: A survey,” Mobile Networks and Applications, vol. 19,
no. 2, pp. 171–209, 2014.
[5] N. Hynes, D. Dao, D. Yan, R. Cheng, and D. Song, “A demonstration of sterling: A privacy-
preserving data marketplace,” Proceedings of the VLDB Endowment, vol. 11, no. 12, pp. 2086–
2089, 2018.
[6] G. S. Ramachandran, R. Radhakrishnan, and B. Krishnamachari, “Towards a decentralized
data marketplace for smart cities,” in 2018 IEEE International Smart Cities Conference
(ISC2), IEEE, 2018, pp. 1–8.
[7] L. Giaretta, I. Savvidis, T. Marchioro, S. Girdzijauskas, G. Pallis, M. Dikaiakos, and E.
Markatos, “Pds2: A user-centered decentralized marketplace for privacy preserving data pro-
cessing,” in Third International Workshop on Blockchain and Data Management, 2021.
[8] P. Banerjee and S. Ruj, “Blockchain enabled data marketplace–design and challenges,” arXiv
preprint arXiv:1811.11462, 2018.
[9] N. B. Somy, K. Kannan, V. Arya, S. Hans, A. Singh, P. Lohia, and S. Mehta, “Ownership
preserving ai market places using blockchain,” in 2019 IEEE International Conference on
Blockchain (Blockchain), IEEE, 2019, pp. 156–165.
[10] B. K. Samanthula, G. Howser, Y. Elmehdwi, and S. Madria, “An efficient and secure data
sharing framework using homomorphic encryption in the cloud,” in Proceedings of the 1st
International Workshop on Cloud Intelligence, 2012, pp. 1–8.
[11] R. L. Rivest, L. Adleman, M. L. Dertouzos, et al., “On data banks and privacy homomor-
phisms,” Foundations of secure computation, vol. 4, no. 11, pp. 169–180, 1978.
[12] A. Acar, H. Aksu, A. S. Uluagac, and M. Conti, “A survey on homomorphic encryption
schemes: Theory and implementation,” ACM Computing Surveys (CSUR), vol. 51, no. 4,
pp. 1–35, 2018.
[13] J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homomorphic encryption for arithmetic of
approximate numbers,” in International Conference on the Theory and Application of Cryp-
tology and Information Security, Springer, 2017, pp. 409–437.
[14] M. Albrecht, M. Chase, H. Chen, J. Ding, S. Goldwasser, S. Gorbunov, S. Halevi, J. Hoff-
stein, K. Laine, K. Lauter, S. Lokam, D. Micciancio, D. Moody, T. Morrison, A. Sahai,
and V. Vaikuntanathan, “Homomorphic encryption security standard,” HomomorphicEn-
cryption.org, Toronto, Canada, Tech. Rep., Nov. 2018.
[15] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” Bitcoin, Tech. Rep., 2019.
[16] J. Garay, A. Kiayias, and N. Leonardos, “The bitcoin backbone protocol: Analysis and ap-
plications,” in Annual International Conference on the Theory and Applications of Crypto-
graphic Techniques, Springer, 2015, pp. 281–310.
[17] G. Wood et al., “Ethereum: A secure decentralised generalised transaction ledger,” Ethereum
project yellow paper, vol. 151, no. 2014, pp. 1–32, 2014.
[18] C. Dannen, Introducing Ethereum and solidity. Springer, 2017, vol. 1.
[19] H. Chen, K. Laine, and R. Player, “Simple encrypted arithmetic library-seal v2. 1,” in In-
ternational Conference on Financial Cryptography and Data Security, Springer, 2017, pp. 3–
18.
[20] A. Benaissa, B. Retiat, B. Cebere, and A. E. Belfedhal, Tenseal: A library for encrypted
tensor operations using homomorphic encryption, 2021. arXiv: 2104.03152 [cs.CR].
[21] M. Kaleem, A. Mavridou, and A. Laszka, “Vyper: A security comparison with solidity based
on common vulnerabilities,” in 2020 2nd Conference on Blockchain Research & Applications
for Innovative Networks and Services (BRAINS), IEEE, 2020, pp. 107–111.
[22] M. Grinberg, Flask web development: developing web applications with python. " O’Reilly
Media, Inc.", 2018.
[23] F. Aurenhammer and R. Klein, “Voronoi diagrams.,” Handbook of computational geometry,
vol. 5, no. 10, pp. 201–290, 2000.
[24] W. Du and M. J. Atallah, “Secure multi-party computation problems and their applica-
tions: A review and open problems,” in Proceedings of the 2001 workshop on New security
paradigms, 2001, pp. 13–22.
[25] F. Boemer, S. Kim, G. Seifu, F. D. M. de Souza, and V. Gopal, “Intel HEXL: accelerating
homomorphic encryption with intel AVX512-IFMA52,” CoRR, vol. abs/2103.16400, 2021.
arXiv: 2103.16400. [Online]. Available: https://arxiv.org/abs/2103.16400.
[26] D. A. Keim, “Visual exploration of large data sets,” Communications of the ACM, vol. 44,
no. 8, pp. 38–44, 2001.
[27] L. De Branges, “The stone-weierstrass theorem,” Proceedings of the American Mathematical
Society, vol. 10, no. 5, pp. 822–824, 1959.
[28] S. Carpov, P. Dubrulle, and R. Sirdey, “Armadillo: A compilation chain for privacy pre-
serving applications,” in Proceedings of the 3rd International Workshop on Security in Cloud
Computing, 2015, pp. 13–19.
[29] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” in
2008 IEEE Symposium on Security and Privacy (sp 2008), IEEE, 2008, pp. 111–125.
[30] F. McSherry and I. Mironov, “Differentially private recommender systems: Building privacy
into the netflix prize contenders,” in Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining, 2009, pp. 627–636.
[31] P. Miller, R. Styles, and T. Heath, “Open data commons, a license for open data.,” LDOW,
vol. 369, 2008.
[32] C. Gentry, “A fully homomorphic encryption scheme,” crypto.stanford.edu/craig, Ph.D.
dissertation, Stanford University, 2009.

You might also like