SPDL: Blockchain-Secured and Privacy-Preserving Decentralized Learning

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO.
8, AUGUST 2015 1
SPDL: Blockchain-secured and

Privacy-preserving Decentralized Learning
Minghui Xu, Member, IEEE, Zongrui Zou,Ye Cheng, Qin Hu, Member, IEEE,
Dongxiao Yu, Senior Member, IEEE, Xiuzhen Cheng, Fellow, IEEE
Abstract—Decentralized learning involves training machine learning models over remote mobile devices, edge servers, or cloud
servers while keeping data localized. Even though many studies have shown the feasibility of preserving privacy, enhancing training
performance or introducing Byzantine resilience, but none of them simultaneously considers all of them. Therefore we face the
following problem: how can we efficiently coordinate the decentralized learning process while simultaneously maintaining learning
security and data privacy? To address this issue, in this paper we propose SPDL, a blockchain-secured and privacy-preserving
decentralized learning scheme. SPDL integrates blockchain, Byzantine Fault-Tolerant (BFT) consensus, BFT Gradients Aggregation
arXiv:2201.01989v1 [cs.CR] 6 Jan 2022
Rule (GAR), and differential privacy seamlessly into one system, ensuring efficient machine learning while maintaining data privacy,
Byzantine fault tolerance, transparency, and traceability. To validate our scheme, we provide rigorous analysis on convergence and
regret in the presence of Byzantine nodes. We also build a SPDL prototype and conduct extensive experiments to demonstrate that
SPDL is effective and efficient with strong security and privacy guarantees.
Index Terms—Decentralized Learning; Byzantine resilience; Blockchain; Privacy preservation
1 I NTRODUCTION
With the increasing amount of data and growing complexity
of machine learning models, there is a rigid demand for
utilizing computational hardware and storage owned by
various entities in a distributed network. State-of-the-art
distributed machine learning schemes adopt three major
network topologies shown in Fig. 1. Federated learning
[1] utilizes an efficient centralized network illustrated in
Fig. 1(a), where a parameter server aggregates gradients Fig. 1. Three major network topologies adopted in distributed learning
computed by distributed devices and updates the global
model for them while preserving privacy since devices
compute locally without communicating with each other.
phones, IoT sensors and vehicles are generating a large
The fragility of the centralized network topology lies in that
amount of data nowadays. A decentralized network filled
a centralized parameter server suffers from the single point
with real-time big data can lead to tremendous improve-
of failure problem (a server might crash or be Byzantine). To
ments in large-scale applications, e.g., illness detection,
solve this issue, El-Mhamdi et al. [2] proposed a Byzantine-
outbreak discovery, and disaster warning, which involve a
resilient learning network shown in Fig. 1(b), which substi-
large number of decentralized edge and cloud servers from
tutes the centralized server with a server group in which no
different regions and/or countries. However, this poses a
more than 1/3 servers can be Byzantine.
new critical challenge: How can we efficiently coordinate the
In this paper, we make a step further to break the barriers
decentralized learning process while simultaneously maintaining
among the parameter servers and the computational de-
learning security and data privacy?
vices, and allow all nodes to train models in a decentralized
network, as shown in Fig. 1(3). Such a decentralized net- Specifically, decentralized learning confronts with the
work is frequently adopted in ad hoc networks, edge com- following challenges. 1) Without a fully trusted centralized
puting, Internet-of-Things (IoT), decentralized applications custodian, users have no incentive but is reluctant to par-
(Dapp), etc. It can greatly unleash the potential for build- ticipate in the learning process due to the lack of trust in a
ing large-scale (even worldwide) machine learning models decentralized network. Therefore, it is highly possible that
that can reasonably and fully maximize the utilization of the volume of data might be insufficient to train a reli-
computational resources [3]. Besides, devices such as mobile able model. 2) In decentralized learning, a Byzantine node
who can behave arbitrarily (e.g., crash or launch attacks)
M. Xu, Z. Zou, Y. Cheng, D. Yu, and X. Cheng are with the School of might prevent the model from convergence or interrupt
Computer Science and Technology, Shandong University, Qingdao, 266510, the training process. 3) It is challenging to make the trade-
P. R. China. E-mail: mhxu@sdu.edu.cn; zou.zongrui@mail.sdu.edu.cn; off between privacy & security as well as efficiency, and
chengye0311@163.com; {xzcheng, dxyu}@sdu.edu.cn
Q. Hu is with the Department of Computer and Information Science, Indiana to make data sharing frictionless with privacy and trans-
University-Purdue University Indianapolis, USA. E-mail: qinhu@iu.edu parency guarantees.
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version
may no longer be accessible.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2
To overcome the above challenges, we propose SPDL, N − 2f values. This approach sheds light on providing
a decentralized learning framework that simultaneously byzantine resilience for distributed learning. Following this
ensures strong security using blockchain (as an immutable idea, many byzantine-resilient aggregation rules were pro-
distributed ledger), BFT consensus, and BFT GAR, and posed. They all work towards a common goal – more pre-
preserves privacy utilizing local gradient computation and cisely and efficiently trim the byzantine values. Blanchard et
differential privacy (DP). Blockchain, as a key component, al. were the earliest to tackle the Byzantine resilience prob-
maintains a complete, immutable, traceable record of the lem in distributed learning by the Krum algorithm which
machine learning process, covering user registration, gra- can guarantee convergence despite f Byzantine workers
dients by rounds, and model parameters. With blockchain, (in a server-worker architecture). Krum stimulates the ar-
a user can identify illegal and Byzantine peers, update rivals of many BFT aggregation rules including Median [7],
its local model without concerns, and ultimately trust the Bulyan [8], and MDA [2]. A recent work [9] demonstrates
SPDL scheme. The BFT consensus algorithm and the BFT that these aggregation rules can function together with the
GAR are embedded into the blockchain. Concretely, the DP technique under proper assumptions.
BFT consensus algorithm ensures consistency of model tran-
sition during multiple rounds while the BFT GAR offers
an effective method of detecting and filtering Byzantine 2.2 Blockchain-Enhanced Distributed Learning
gradients at each round. Concerning privacy, we let nodes The BinDaaS [10] framework provides a blockchain-based
compute gradients with their local training data, and only deep learning service, ensuring data privacy and confiden-
share perturbed gradients with peers, which provides a tiality in sharing Electronic Health Records (EHRs). With
strong privacy protection. BinDaaS, a patient can mine a block filled with a private
Our contributions are summarized as follows health record, which can be accessed by legitimate doctors.
BinDaaS trains each model based on a patient’s private
1) To our best knowledge, this is the first secure and
EHRs independent of others’ data. In contrast, decentral-
privacy-preserving machine learning scheme for de-
ized learning intends to train a global model using the
centralized networks in which learning processes
data owned by individual nodes, arising more security and
are free from trusted parameter servers.
privacy concerns. Hu et al. [11] utilized a blockchain and
2) SPDL makes use of DP for data privacy protec-
a game-theoretic approach to protect the user privacy for
tion and seamlessly embeds BFT consensus and
federated learning in mobile crowdsensing. The collective
BFT GAR into a blockchain system to benefit
extortion (CE) strategy was proposed in [?] as an incentive
model training with Byzantine fault tolerance, trans-
mechanism that can regulate workers’s behavior. These two
parency, and traceability while retaining high effi-
game-based approaches cannot strictly guarantee the safety
ciency.
of model training against byzantine nodes. Lu et al. [12]
3) We conduct rigorous convergence and regret anal-
developed a learning scheme that protects privacy by differ-
yses on SPDL in the presence of Byzantine nodes,
ential privacy, and proposed the Proof of Training Quality
build a prototype, and carry out extensive experi-
(PoQ) consensus algorithm for model convergence. This
ments to demonstrate the feasibility and effective-
scheme does not consider byzantine users who might dis-
ness of SPDL.
turb the training processes by proposing erroneous model
This paper is organized as follows. Section 3 outlines parameters.
the necessary preliminary knowledge needed by the de- FL-Block [13] allows end devices to train a global model
velopment of SPDL. Section 4 details the SPDL protocol. secured by a PoW-based blockchain. LearningChain [14] is
The analysis on convergence and regret are presented in a differential privacy based scheme to protect each party’s
Section 5. Evaluation results are reported in Section 6. We data privacy, also resting on a PoW-based blockchain.
summarize the most related work in Section 2 and conclude Warnat-Herresthal et al. [15] proposed the concept of swarm
this paper in Section 7. learning which is analogous to decentralized learning, in
which each node can join the learning process managed
by Ethereum (again, PoW-based) and smart contracts. Com-
2 R ELATED W ORK
pared to a pure adoption of centralized federated learning
2.1 Privacy and Byzantine Resilience in Distributed schemes, PoW-based blockchains can help avoid the single
Learning point of failures and mitigate poisoning attacks caused by a
Private learning schemes include secure multiparty com- central parameter server. With PoW, a miner can propose a
putation, encryption, homomorphic encryption, differential block containing model parameters used for a global model
privacy, and aggregation models. For details, we recom- update. However, this method cannot rigorously prevent
mend two comprehensive surveys [4], [5] to the interested malicious miners from harming the training processes by
readers. Su and Vaidya [6] introduced the distributed opti- proposing wrong updates. The PoW consensus itself is not
mization problem in the presence of Byzantine failures. The sufficient to judge whether given parameters are byzantine
problem was formulated as one in which each node has or not. Therefore, PoW-based blockchains are vulnerable to
a local cost function, and aims to optimize the global cost poisoning attacks launched by byzantine nodes. Besides,
function. The proposed method, namely the synchronous PoW-based blockchains are hard to scale since PoW can
Byzantine gradient method (SBG), first trims the largest f incur much overhead and result in heavy consumption
gradients and the smallest f gradients, then computes the of computational resources. Another noteworthy system is
average of the minimum and the maximum of the remaining Biscotti [16], which utilizes a commitment scheme to protect
data privacy and adopts a novel Proof-of-Federation based round t. In the rest of this paper we omit the subscript if all
blockchain for security guarantee in decentralized learning. peers have the same local parameter.
Using a commitment scheme, the aggregation of parameters
can be verifiable and tamper-proof. 3.2 Blockchain Basics
In this paper, we leverage the DP technique for privacy
protection since adding noises (only needs an addition Blockchain, as a distributed ledger, refers to a chain of blocks
operation using pre-calculated noise) is more efficient than linked by hashes and spread over all the nodes in a peer-
employing complicated cryptographic tools. In addition, to-peer network, namely blockchain network. A full node
using DP and a BFT consensus, we can ensure the byzantine stores a full blockchain in its local database. A blockchain
fault tolerance for the whole training process, which can not starts from a genesis block, and each block except for the
be realized by existing works. Our unique contributions lie genesis block is chained to a previous block by referencing
in two aspects: 1) providing security and privacy guarantees its hash. Typically, there are two categories of blockchain
for the complete training process, by seamlessly integrating systems based on scale and openness: permissioned and
DP, BFT aggregation, and blockchain in one system, con- permissionless. In this paper, we adopt a permissioned
sidering byzantine nodes; and 2) offering rigorous analysis blockchain since it can provide faster speed and more re-
on the Byzantine fault-tolerance, convergence and regret for stricted registration control than permissionless ones.
decentralized learning. Blockchain consists of three major components:
blockchain network, distributed ledger, and consensus algo-
rithm. A blockchain system organizes registered nodes into
3 P RELIMINARIES a P2P network, formulating a complete graph. A distributed
3.1 Decentralized Learning ledger is immutable and can be organized as a chain, a
Direct Acyclic Graph (DAG), or a mesh. In this paper, we
With the prosperity of machine learning tasks that need to
use a chain as the data structure of our ledger. As the core of
process massive data, distributed learning was proposed to
a blockchain system, the consensus process determines how
coordinate a large number of devices to complete a training
to append a new block to the chain. Two types of consen-
process so as to achieve a rapid convergence. In recent years,
sus algorithms are commonly adopted: proof-of-resources
the research on distributed learning was mainly carried out
and message passing. Proof-of-resources means that nodes
along two directions: centralized learning and decentralized
compete for proposing blocks by demonstrating their uti-
learning, whose schematic diagrams are shown in Fig. 1.
lization of resources, e.g., computational resources, stake,
A centralized topology uses a parameter server (PS) to
storage, memory and specific trust hardware. On the other
coordinate all workers by gathering gradients, performing
hand, message passing based consensus has been widely
local updates, and broadcasting new parameters; while in
researched in the area of distributed computing. Such algo-
a decentralized topology, all nodes are considered as equal
rithms always provide clear assumptions on nodes’ faulty
and exchanges information without the intervention of a PS.
behaviors such as fail-stop and Byzantine attacks. In this
Decentralized learning has many advantages over cen-
paper, we leverage Byzantine fault tolerance (BFT) consen-
tralized learning. In [3], Lian et al. rigorously proved that
sus algorithm, which can address Byzantine nodes who can
a decentralized algorithm has lower communication com-
launch arbitrary attacks.
plexity and possesses the same convergence rate as those
under a centralized parameter-server model. Furthermore,
a centralized topology might not hold in a decentralized 3.3 Gradient Aggregation Rule (GAR)
network where no one can be trusted enough to act as A Gradient Aggregation Rule (GAR) is used to aggregate
a parameter server. Therefore, We consider the following gradients received from peers during each round. A tradi-
decentralized optimization during a learning process: tional GAR averages gradients to eliminate errors. Concern-
N ing gradients generated by Byzantine nodes, a GAR can be
1 X
min f (x) = Eξ∼Di F (x; ξ), more elaborately designed to inject robustness. For example,
x∈RN N i=1 Krum and Multi-Krum are the pioneering GARs that satisfy
where N is the network size, Di is the local data distribution Byzantine resilience [17]. The essence behind these two
for node i, F (x; ξ) denotes the loss function given model GARs are to choose the gradient with the closest (N − f )
parameter x and data sample ξ . Let fi (x) = Eξ∼Di F (x; ξ). (N denotes the network size and f denotes the number
In a fully-connected graph, during any synchronous round, of Byzantine nodes) neighbors based on the assumption
each node performs a deterministic aggregation function that the honest majority should have similar gradients.
(e.g. average function) K on perturbed gradients received Median [7] and MDA [2] are another two GARs that adopt
from all other peers to update its local parameter, i.e., analogous ideas to ensure the BFT gradient aggregation. In
this paper, SPDL leverages Krum as the BFT GAR but is not
(t+1) (t) (t) (t)
xi = xi − γ · K(g1 , g2 , · · · , gn(t) ), limited to it.
where γ is the learning rate. Notice that since K is deter-

ministic, all nodes should have exactly the same parameter 3.4 Differential Privacy
and should initialize it with the same value. In this case, we The privacy guarantee is of vital importance if the nodes
denote by A an arbitrary synchronous distributed learning carrying out decentralized learning do not admit their local
algorithm that updates the mutual parameter x, and use training data to be shared. Although each node communi-
A(D1 , D2 , · · · , Dn ; t) to denote the output parameter x at cates with its neighbors by transmitting parameters instead
of sending raw data, the risk of leaking information still TABLE 1

exists [18]. Differential privacy is an effective method to Summary of Notations
avoid leaking any information of a single individual by
Symbol Description
adjusting the feedback of the query operations, no matter (t)
what auxiliary information the adversary node might have. g̃i the gradient computed by i in round t
(t)
In decentralized learning, the process of exchanging param- Gi the random noise added to i’s gradient in round t
(t)
gi the perturbed gradient to be transmitted in round t
eters involves a sequence of queries. The differential privacy
η the learning rate
in our setup can be formally defined as follows: x(t) model parameters at the end of round t
(t)
Definition 1. ((, δ )-differential-privacy) Denote by A an ar- ξi datasets randomly sampled from local datasets Di
n, f the number of nodes and Byzantine nodes
bitrary synchronous distributed learning algorithm that updates σ2 the variance of Gaussian noise
model parameter x. For any node i in a decentralized system and σf2
(t)
the upper bound of EkE(gi ) − gi k2
(t)
any two possible local data-sets Di and Di0 with Di differing ∆(t) the output of BFT GARs
from Di0 by at most one record, if for any round t and any (t)
Bk , BCk
(t)
block and blockchain in t-th round of k-th epoch
S ⊆ Range(A), it holds that , δ budgets of differential privacy
d the dimension of gradients
P r (A(D, Di ; t) ∈ S) ≤ e P r(A(D, Di0 ; t) ∈ S) + δ, N (i) the neighbors of node i
we then claim that A preserves (, δ )-differential-privacy, and

pack all Dj (j 6= i) into D. to mislead correct nodes, and make incorrect votes during
N the consensus process. When i chooses not to send any data
Definition 2. For any function f : D → R , the L2 -sensitivity
in a synchronous round t, it’s neighbor acts like receiving
of f is defined as (t)
gi = 0.
∆2 f = max||f (d1 ) − f (d2 )||,
d1 ,d2 4.2 Design Objectives
for all d1 , d2 differing in at most one element. In this subsection, we briefly summarize our design goals.
Differential privacy can be realized by adding Gaussian 1) Decentralization: SPDL should work in a decentral-
noises to the query results [19]. The following lemma, ized network setting without the intervention of any
proved in [20], demonstrates how to properly choose a centralized party such as a parameter server.
Gaussian noise. 2) Differential Privacy: SPDL should guarantee (, δ)-
DP by adding random Gaussian noises to perturb
Lemma 1. For each node transmitting gradients perturbed by gradients. Meanwhile we aim to reach a balance
Gaussian noise with distribution N (0, σ 2 ), the gradient ex- between privacy leakage and convergence rate.
changes within T successive rounds p preserve (, δ )-differential- 3) Byzantine Fault-Tolerance: SPDL can ensure conver-
privacy as long as σ ≥ CT γ 2 ln(1.25/δ)/, where C = gence against at most f (N = 3f + 1) Byzantine
∆2 g (t) and γ is the learning rate. nodes who can behave arbitrarily.
In brief, the differential private scheme we employed 4) Immutability, Transparency, and Traceability: the
in this paper is sketched as follows. For any node i, we full record of the machine learning process should
use random Gaussian noise to perturb its gradients before be immutable and transparent, and provide trace-
transmitting to other nodes. When nodes obtain the result of ability enabling Byzantine node detection.
the aggregation function K whose inputs are their perturbed
gradients, the differential privacy for node i is guaranteed. 4.3 Protocol Details
4.3.1 Initialization
4 T HE P ROTOCOL Initially, each node creates a pair of private key sk and public
key pk, and generates its unique 256-bit identity id based
4.1 Model and Assumptions
on pk. Public keys and identities are broadcast through the
We focus on scenarios where nodes are able to communicate network to be publicly known by all nodes. Then a genesis
with each other in a decentralized network. Specifically, we block B0 is created, which records the information of the
formalize the decentralized communication topology as a nodes who initially participate in the blockchain network.
directed and fully connected graph G = (V, E), where A new coming node should have a related tx being added
V (|V | = N ) denotes the set of all peers and for any to the blockchain before joining the permissioned network,
i,j ∈ V ,we have (i, j) ∈ E . Time is divided into epochs where tx contains necessary information including its pk ,
(denoted by k ), with each consisting of synchronous rounds id, IP address, etc. Initially, all nodes have an identical
(denoted by t), and a model can be trained within each reputation value, that is wi = ŵ. Each node initializes itself
epoch. We denote frequently-used notations of transaction, with a learning rate γ , the number of total rounds T , and
(t) (t)
block, blockchain, chain of block headers, by tx, Bk , BCk , the variance of Gaussian noise for perturbing the gradients.
(t)
and BHk , respectively. For simplicity and consistency of the iterating process, we
We assume the network is unreliable with at most f assume all nodes start the learning procedure with the
(0)
possible Byzantine nodes, and N = 3f + 1 . A Byzantine same initial parameter value xi . After initialization, nodes
node i can behave arbitrarily. For example, it may refuse to undergo a leader election process to determine who is in
compute gradient, transmit arbitrary but unlawful gradients charge of the training process.
Fig. 2. SPDL workflow (t-th round)
4.3.2 Leader Election time. To summarize, our leader election algorithm achieves
the following three basic functionalities:
Algorithm 1: Leader Election • A node with a zero reputation value has no right of
1 (hi , πi ) = VRF(ski , seed) being a leader.
2 Li = (hi , πi , idi ) • The leader election process possesses full random-
3 Broadcast (hi , πi ) to the network and receive (h, π) ness and unpredictability properties.
form peers • A Byzantine node cannot disguise itself as a leader
4 while TRUE do since VRF ensures that the proof h is unforgeable.
5 if receive (hj , πj ) from idj && After leader election, nodes start the round-based train-
VerifyVRF(pk, hj , πj , seed)=1 then ing process, with each round consisting of the gradient
6 add (hj , πj , idj ) to Li computation and blockchain consensus processes.
7 if Time() > start + δ1 then
8 select the largest hmax from Li and obtain 4.3.3 Gradient Computation
idmax At each round, node i exchanges its perturbed gradients
Output: idmax with all other nodes in the blockchain network. Specifically,
(t)
each node preserves a true stochastic gradient g̃i and a
(t)
perturbed one gi to be shared. The whole exchange process
Each node executes the leader election algorithm shown can be summarized into the following steps:
in Algorithm 1. The algorithm is based on the verifiable • Local Gradient computation: compute local stochas-
random function (VRF), which takes as inputs a private (t) (t) (t)
tic gradient g̃i = ∇Fi (x(t) , ξi ), where ξi is ran-
key sk and a random seed, and outputs a hash string domly sampled from local dataset Di .
h as well as the corresponding proof π . Each contender • Adding noise: add random Gaussian noise to the
broadcasts (hi , πi ) to the network and receives (h, π) from local gradient to be shared. The variance of the noise
its peers. Note that each node is assigned with a reputation is denoted by input variable σ .
variable r ∈ [0, 1]. A node i with ri = 0 is prohibited • Broadcast gradients: send the perturbed local gradi-
from being a leader. Then the node with the largest h and ents to all other nodes, and receive gradients from
r > 0 is recognized as the leader who is responsible for the others at the same time.
blockchain consensus. The case when more than one leaders
are selected is extremely small since h has a large space of
2256 if we adopt the commonly used SHA-256; but if this 4.3.4 Blockchain Consensus
extreme case happens, all nodes relaunch the leader election The blockchain consensus process deeply integrates a
process to ensure that only one leader is finally selected. blockchain, a BFT consensus protocol (e.g., PBFT, Tender-
We also set a timeout for the leader election process as mint), and a BFT aggregation function (e.g., Krum, Me-
Time() > start+δ1 , where Time() extracts the current UNIX dian). In this paper, we adopt the Practical Byzantine Fault
(t)
Algorithm 2: Gradient Computation function chooses one gradient gi
that is the “closest” to its
(0) surrounding gradients. More precisely, the output of Krum
1 Initialize: xi , learning rate γ , number of total is one of its input gradients, and the index of this gradient
rounds T , and variance of noise σ is:
2 for t = 0 to T − 1 do X (t) (t)
3 . Local Computation arg min kgi − gj k.
i
(t) j∈near(i)
4 Randomly sample ξi and compute local
(t) (t)
stochastic gradient g̃i = ∇Fi (x(t) , ξi ) MSGB takes as input ∆(t) and forms a new block Bk
(t)
5 . Adding Noise which records ∆(t) . Then the leader broadcasts a signed pre-
6 Randomly generate Gaussian noise prepare message as hPRE-PREPARE, id, Bk , hiσ̂ .
(t)
(t)
Gi ∼ N (0, σ 2 ) and add noise to the variable Inthe PREPARE phase, each follower computes ∆ ˜ (t) =
(t) (t) (t)
gi = g̃i + Gi (t) (t) (t)
K(g1 , g2 , · · · , gn ) based on its local perturbed gradi-
7 . Broadcast Gradients ents, then waits for pre-prepare messages. If a pre-prepare
(t) (t)
8 Broadcast gi to the network and receive gj message is received, the follower first verifies the digi-
from each peer j tal signature σ and the block (height, block hash, etc.).
Then it compares ∆ ˜ (t) with B (t) .∆(t) . The requirement of
k
(t)
˜ ≈ B .∆ means ∆
(t) (t) ˜ (t) ∈ (B (t) .∆(t) −δ, B (t) .∆(t) +δ),
∆ k k k
Algorithm 3: Blockchain Consensus where δ is a small variation. This condition indicates that
1 . To prevent deadlock, each node starts a view each follower should have a similar view on non-Byzantine
change if Time() > start + δ2 gradients as the leader. If verification is passed, the follower
2 . PRE-PREPARE broadcasts a prepare message hPREPARE, id, h, voteiσ̂ .
3 if role is leader then In the COMMIT phase, if a node receives 2f +1 valid com-
(t) (t) (t) mit messages, it can broadcast a decision hCOMMIT, id, voteiσ̂
4 ∆(t) = K(g1 , g2 , · · · , gn )
(t) and enter into the following DECIDE phase.
5 Bk ← MSGB(∆(t) ) In the DECIDE phase, upon receiving 2f + 1 valid
(t)
6 broadcast hPRE-PREPARE, id, Bk , hiσ̂ commit messages, a node can append a new block Bk
(t)
7 . PREPARE to its local blockchain BCk , update its local gradient as

(t+1) (t)
8 if role is follower then xi = xi − γ∆(t) , and finally update its reputation.
9 compute ∆ ˜ (t) = K(g (t) , g (t) , · · · , gn(t) ) The reputation of a certain node i can be reduced if its
1 2
10 while receive hPRE-PREPARE, id, Bk , hiσ̂ do
(t) gradient deviates from the aggregated gradient by more
(t) ˜ ≈ B (t) .∆(t)
(t) than π/2. Even though we do not explicitly introduce the
11 if σ and Bk are valid and ∆ k view change, we do have such process to address the case
then
when a leader is a Byzantine node. To avoid the occurrence
12 broadcast hPREPARE, id, h, voteiσ̂
of a deadlock, we set a timeout in the blockchain consensus
process. If Time() > start + δ2 , each node broadcasts a
13 . COMMIT
view change message hVIEW-CHANGE, id, hiσ and waits for
14 while receive 2f + 1 hPREPARE, id, h, voteiσ̂ do
other peers’ responses. Upon receiving 2f + 1 view change
15 broadcast hCOMMIT, id, voteiσ̂
messages, a node can abandon the current round.
16 . DECIDE
17 while receive 2f + 1 hCOMMIT, id, h, voteiσ̂ do
(t) (t)
18 Append(BCk , Bk ) 5 T HEORETICAL A NALYSIS
(t+1) (t)
19 xi = xi − γ∆(t)
In this section, we provide both convergence analysis and
20 Update reputation
regret analysis on SPDL in the presence of Byzantine nodes.
(t) (t) (t)
Without loss of generality, let g1 , g2 , · · · , gn−f be the
perturbed gradients sent out by honest nodes in round
(t) (t) (t)
Tolerant (PBFT) protocol as our consensus backbone due t, and gn−f +1 , gn−f +2 , · · · , gn be the gradients sent out
to its effectiveness validated by the Hyperledger Saw- by possible Byzantine nodes in round t. We assume that
tooth. The aggregation rule used in blockchain consensus is the gradients derived by correct nodes are independently
Krum. Concretely, the blockchain consensus consists of four sampled from the random viable G̃ and that E(G̃) = g . By
phases: PRE-PREPARE, PREPARE, COMMIT, and DECIDE. adding Gaussian noise, a perturbed gradient then can be
In the PRE-PREPARE phase, the leader computes an considered as an instance sampled from random variable
aggregated gradient using an aggregation function ∆(t) = G = G̃ + G , where G follows the Gaussian distribution of
(t) (t) (t)
K(g1 , g2 , · · · , gn ), where K is a (b, α)-Byzantine resilient mean 0 and variance σ 2 . Therefore we also have E(G) = g .
Krum function. The core idea of Krum is to eliminate the In this section, if we only concentrate on a certain round t,
gradients that are too far away from others. We use Euclid we omit the superscript on variables when ambiguity can
(t) (t) be avoided from context.
distance kgi − gj k2 to measure how far two gradients
are separated. Then we define near(i) to be the set of Definition 3. ((k, f )-Byzantine Resilience) Let 0 < k ≤ 1 and
(t)
n − f − 2 closest gradients to gi . We expect that the Krum f be the number of Byzantine nodes in a distributed system, then
our anti-Byzantine mechanism K is said to be (k, f )-Byzantine The above inequality holds since E(ri −rj ) and E(gi −gj )
Resilient if are both 0. Because there are exactly n − f correct nodes, we
have
(t) (t)
h(t) = K(g1 , g2 , · · · , gn(t) )   2
1 X
satisfies that hEh(t) , g (t) i ≥ kkg (t) k2 . E gi + ri − (gj + rj )
|Nc (i)|
j∈Nc (i)
3
Theorem 1. In round t, If kgk2 f − 4 > 18dσf2 and ≤ 2d(n − f )(σ + 2 2
σf ).
If i = k is one of the Byzantine nodes, i.e., i∗ ∈ B ,
∗
p
2C ln(1.25/δ)
> ,
C2   2
1 X
where E Bk − (gj + rj )
|Nc (k)|
j∈Nc (k)
kgk2 − 3 1
1
C2 = ( f 4 − σf2 ) 2 , (1)
X
18d ≤ EkBk − gj − rj k2
|Nc (k)|
j∈Nc (k)
our anti-Byzantine mechanism K achieves (k, f )-Byzantine Re- 1 X
silience with ≤ Ekgi + ri − gj − rj k2
|Nc (k)|
√ 3√ j∈Nc (i)
2C 2 ln(1.25/δ)

3 2f 2 d 1
k =1− σf2 + .
X
kgk 2 + Ekgi + ri − gj − rj k2 ,
|Nc (k)|
j∈Nf (i)
Proof. Denote by N (i) the set of n−f −2 closest gradients of where i ∈/ B is any correct node. By the definition of N (i),
the i-th node, Nc (i) the collection of the perturbed gradients a node labeled ζ(i) is correct, but is farther away from any
in N (i) sent by correct nodes, while Nf (i) the gradients in node in N (i). Therefore we have:
N (i) sent by the Byzantine nodes. Let i∗ be the index of the  2
gradient chosen by K, we then have

1 X
E Bk − (gj + rj )
  2 |Nc (k)|
1 X j∈Nc (k)
(t) 2 (t)
kEh − gk = E h − (gi + ri ) 1
|Nc (i∗ )|
X
j∈Nc (i∗ ) ≤ Ekgi + ri − gj − rj k2
|Nc (k)|
  2 j∈Nc (i)
(t) 1 X Nf (i)
≤ E h − (gi + ri ) + Ekgi + ri − gζ(i) − rζ(i) k2
|Nc (i∗ )| Nc (k)
j∈Nc (i∗ )
(the above inequality holds since k · k is convex.) Nc (i) Nf (i) X
≤ 2d(σ 2 + σf2 ) + Ekgi − gj + ri − rj k2
  2 Nc (k) Nc (k)
j ∈B∧i6
/ =i
X 1 X
≤ E gi + ri − (gj + rj ) N c (i) N f (i)
|Nc (i)| ≤ 2d(σ 2 + σf2 ) + (n − f − 1)
i∈B
/ j∈Nc (i) N (k) Nc (k)
 2 c
n−f −2

b

2 2
X 1 X ≤ 2d(σ + σf ) + (n − f − 1) .
+ E Bi − (gj + rj ) n − 2f − 2 n − 2f − 2
|N c (i)|
i∈B j∈Nc (i)
Combining the two results where i∗ is a correct node or a
Byzantine node, we have:
If i∗ = i is one of the correct nodes, i.e., i∗ ∈ / B,
n−f −2

  2
kEh(t) − gk2 ≤ 2d(σ 2 + σf2 ) n − f +
1 X n − 2f − 2
E gi + ri − (gj + rj )
b

|Nc (i)| 2 2
+ 2d(σ + σf ) (n − f − 1)
j∈Nc (i)
  2 n − 2f − 2
1 X ≤ 2d(σ 2 + σf2 )(f + 3 + f (f − 1) + f 2 (f + 2))
=E  (gi + ri − gj − rj )
|Nc (i)|
j∈Nc (i)
≤ 36df 3 (σ 2 + σf2 ).
  2 Since
1 X
= (gi − gj ) + (ri − rj )
p
|Nc (i)|2
E  2C ln(1.25/δ)
j∈Nc (i) > ,
C2
  2
1 X then
= E  (gi − gj ) + (ri − rj )
|Nc (i)|2 kgk2 − 4

j∈Nc (i) 2 b 3 − σf2 > 2c ln(1.25/δ).
1 18d
2 2
X
≤ E kgi − gj k + E kri − rj k
|Nc (i)| Therefore
j∈Nc (i)
√ 3√ 1
≤ 2d(σ + σf2 ).
2 kgk > 3 2b 2 d(σf2 + σ 2 ) 2 .
Finally, we have
√ 3√ 1

hEh(t) , gi ≥ kgk − 3 2b 2 d(σf2 + σ 2 ) 2 kgk = kkgk2 . Thus the theorem can be immediately proved by setting
1
η=√ .
T
Regret analysis is commonly used in online learning to
investigate the loss difference caused by two learning meth- 6 E VALUATION
ods. Therefore we present a regret analysis on SPDL to show 6.1 Configuration
the incurred loss difference with and without Byzantine
nodes. We implement SPDL with 3500 lines of Python code and
In a decentralized system with Byzantine nodes, let At ∈ conduct the experiments on a DELL PowerEdge R740
Rd×1 be the parameter model of a node in round t, and server which has 2 CPUs (Intel Xeon 4214R) with 24 cores
Xt ∈ Rd×n be the raw data randomly sampled by nodes (2.40 GHz) and 128 GB of RAM. SPDL adopts the gRPC
in round t. The loss function in round t can be written as framework, a P2P network for underlying communications,
F (At , Xt ). Consider the destructive effect on the training Pytorch for machine learning libraries, and a blockchain
process caused by Byzantine nodes, we denote by Ãt the system with the PBFT consensus algorithm. We make SPDL
parameter learned by the system per round in the absence open-sourced at Github1 . Nodes bootstrap by generating
of Byzantine nodes. It’s meaningful to compare the gap of key pairs using ECDSA, initializing the genesis block, es-
loss function computed by two different parameters At and tablishing the gRPC connection, exchanging node list, and
Ãt . If there is no Byzantine node, we assume Ãt is updated joining the P2P network. We evaluate SPDL over the image
(t) classification of MNIST dataset, which consists of hand-
by any correct gradient gξ(t) given by a random node ξ(t) ∈
written digits of 70,000 28 × 28 images in 10 classes. The
{1, 2, · · · , n}. dataset is equally divided into N groups, with each assigned
Definition 4. (Regret) to one node. Each node can add Gaussian noise to its
T −1 local gradients with the setting of = 0.02 (if not stated
otherwise) and δ = 10−6 . We evaluate the performance of
X
R(A, Ã) = F (EAt , Xt ) − F (EÃt , Xt ).
i=0
SPDL using the following standard metrics. 1) Test error: the
fraction of wrong predictions among all predictions, using
Theorem 2. If the loss function √satisfies L√
1 -Lipschitz continuity,
the test dataset. We measure the test error with respect
and L1 < 1, then R(A, Ã) ≤ ρ T + o( T ), where
q to rounds, network size, batch size, privacy budget, and
3
6L1 f 2 d(σ 2 + σf2 ) Byzantine ratio. 2) Latency: the latency of each round.
ρ= .
1 − L1
6.2 Evaluation Results
Proof. By the property of L1 -Lipschitz continuity, we have
Convergence with Network Size: For simplicity in our
T −1
X context, we denote “PURE” as the decentralized learning
F (EAt , Xt ) − F (EÃt , Xt )
scheme without leveraging any DP technique, BFT GARs,
i=0
T −1 and blockchain system, and use “DP” to represent a de-
centralized learning scheme using the DP technique only
X
≤ L1 kEAt − EÃt k
i=0
based on “PURE”. We first compare our SPDL with PURE
T −1 and DP schemes in a non-Byzantine environment. As shown
(t−1)
X
≤ L1 kEAt−1 − EÃt−1 + η E h(t−1) − gξ(t) k in Fig. 3, the test error nearly converges after 20 rounds,
i=1 but fluctuates a lot when N is as small as four. When
T −1 N = 30, all schemes almost achieve the same convergence.
(t−1)
X
≤ L1 EkAt−1 − Ãt−1 k + ηL1 kEh(t−1) − Egξ(t−1) k. The SPDL and DP schemes sometimes (e.g., N = 20 or
i=1 N = 30 in our experiments) have lower test error than
Since PURE because adding noises could prevent the training
3
q process from over-fitting. Besides, the network size does
EkEh(t) − g (t−1) k ≤ 6f 2 d(σ 2 + σf2 ), not impact the convergence rate and a large network size
we have contributes to stable convergence.
3
q Latency: To better illustrate the latency of each round,
(t−1)
ηkEh(t−1) − Egξ(t−1) k ≤ 6ηf 2 d(σ 2 + σf2 ). we divide a round into three stages: local gradient com-
putation plus adding noise whose overall time overhead is
Then we obtain
denoted by TLGC , gradient exchange (TGE ), and blockchain
T −1
X consensus (TBC ). As Fig. 4 shows, TLGC , TGE and TBC are
F (EAt , Xt ) − F (EÃt , Xt ) in the same order of magnitude. TGE grows with N simply
i=0
T −1
because more nodes contend for computational resources.
The MNIST classification task can be finished quickly (<0.1
X 3
q
≤ L1 EkAt−1 − Ãt−1 k + 6L1 ηf 2 d(σ 2 + σf2 )
i=1
s/round), so TLGC is lower than TBC in our experiments.
3
q When the machine learning task becomes more difficult
q 3
6T L1 ηf 2 d(σ 2 + σf2 )
≤ 6L1 ηf d(σ 2 + σf2 ) +
2 . 1. https://github.com/isSPDL/SPDL
1 − L1
1.0 1.0 1.0 1.0

PURE PURE PURE PURE
0.8 DP 0.8 DP 0.8 DP 0.8 DP
SPDL SPDL SPDL SPDL
0.6 0.6 0.6 0.6
Test error
Test error
Test error
Test error
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Round Round Round Round
(a) N = 4 (b) N = 10 (c) N = 20 (d) N = 30
Fig. 3. Test error evolution with various network size N = 4, 10, 20, 30.
1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8

Latency (s)
Latency (s)
Latency (s)
Latency (s)
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0

TLGC TGE TBC TLGC TGE TBC TLGC TGE TBC TLGC TGE TBC
(a) N = 4 (b) N = 10 (c) N = 20 (d) N = 30
Fig. 4. Latency of different stages (TLGC , TGE and TBC ) concerning N = 4, 10, 20, 30.
1.0 1.0 1.0 1.0

PURE
0.8 DP 0.8 0.8 0.8
SPDL
0.6 0.6 PURE 0.6 PURE 0.6 PURE
Test error
Test error
Test error
Test error
DP DP DP
0.4 0.4 SPDL 0.4 SPDL 0.4 SPDL
0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Round Round Round Round
(a) BR = 0% (b) BR = 10% (c) BR = 20% (d) BR = 30%
Fig. 5. Test error evolution with various Byzantine ratio BR ∈ 0%, 10%, 20%, 30%.
1.0 1.0
(e.g., 10 min/round), TBC ≈ 1s is an acceptable overhead
PURE PURE and could be even ignored.
0.8 DP 0.8 DP
SPDL SPDL
0.6 0.6
Convergence in the Presence of Byzantine Nodes: We
Test error
Test error
then make a comparison of three different schemes PURE,

0.4 0.4
DP, SPDL with respect to Byzantine Ratio (BR), which is the
0.2 0.2 number of existing Byzantine nodes over N . We set N = 20
0.0
0 20 40 60 80 100
0.0
0 20 40 60 80 100 and BR ∈ 0%, 10%, 20%, 30% considering f = 33% × N ,
Round Round
where BR = 0 represents the non-Byzantine case. As shown
(a) BS = 10 (b) BS = 100 in Fig. 5, the results of the non–Byzantine experiments indi-
cate that the three schemes can achieve similar convergence.
Fig. 6. Test error evolution with batch sizes BS = 10, 100. However, DP and PURE schemes have high test error when
BR > 0%, and fail to ensure model convergence even in the
1.0 1.0 presence of 10%N Byzantine nodes. It is clearly shown that
= 0.04 = 0.04
0.8 = 0.4 0.8 = 0.4
SPDL can still grantee the same convergence with respect to
different levels of Byzantine attacks.
Test error
Test error
0.6 0.6
Batch Size: We then present the performance of the three
0.4 0.4
schemes PURE, DP, SPDL with two different batch sizes
0.2 0.2
(abbreviated as “BS”) in Fig. 6. When BS = 10, the test error
0.0
0 20 40 60 80 100
0.0
0 20 40 60 80 100 in all deployments fluctuate a lot, with the PURE scheme
Round Round
outperforming others because adding noises can perturb the
(a) N = 10 (b) N = 20 model convergence. However, Fig. 6(b) indicates that we can
increase the batch size to ensure a stable convergence and
Fig. 7. Test error evolution with = 0.4, 0.04 and N = 10, 20. make our SPDL perform well as a PURE scheme.
Privacy Budget: We finally test our SPDL scheme by [13] Y. Qu, L. Gao, T. H. Luan, Y. Xiang, S. Yu, B. Li, and G. Zheng,
setting δ = 10−6 with two varying ∈ 0.4, 0.04. A smaller “Decentralized privacy using blockchain-enabled federated learn-
ing in fog computing,” IEEE Internet of Things Journal, vol. 7, no. 6,
represents stronger privacy guarantee. The results presented pp. 5171–5183, 2020.
in Fig. 7 demonstrate that when N = 10, adding noise with [14] X. Chen, J. Ji, C. Luo, W. Liao, and P. Li, “When machine learning
= 0.4 or = 0.04 have similar convergence. However, meets blockchain: A decentralized, privacy-preserving and secure
when N = 20, smaller can cause larger test error. This design,” in 2018 IEEE International Conference on Big Data (Big
Data). IEEE, 2018, pp. 1178–1187.
implies that the tradeoff between accuracy and privacy [15] S. Warnat-Herresthal, H. Schultze, K. L. Shastry, S. Manamohan,
preservation should be carefully adjusted according to spe- S. Mukherjee, V. Garg, R. Sarveswara, K. Händler, P. Pickkers,
cific demands on privacy protection and model accuracy. N. A. Aziz et al., “Swarm learning for decentralized and confi-
dential clinical machine learning,” Nature, vol. 594, no. 7862, pp.
265–270, 2021.
[16] M. Shayan, C. Fung, C. J. Yoon, and I. Beschastnikh, “Biscotti:
7 C ONCLUSION A blockchain system for private and secure federated learning,”
IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7,
SPDL is a new decentralized machine learning scheme pp. 1513–1525, 2020.
which ensures efficiency while achieving strong security [17] P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer,
“Machine learning with adversaries: Byzantine tolerant gradient
and privacy gurantee. In particular, SPDL utilizes BFT descent,” in Proceedings of the 31st International Conference on Neural
consensus and BFT GAR to protect model updates from Information Processing Systems, 2017, pp. 118–128.
harsh Byzantine behaviors, leverages blockchain to enjoy [18] Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, and H. Qi, “Beyond
the benefits of transparency and traceability, and adopts the inferring class representatives: User-level privacy leakage from
federated learning,” in IEEE INFOCOM 2019-IEEE Conference on
DP technique for privacy protection. We provide rigorous Computer Communications. IEEE, 2019, pp. 2512–2520.
theoretical analysis on the effectiveness of our scheme and [19] H. Jiang, J. Pei, D. Yu, J. Yu, B. Gong, and X. Cheng, “Applications
conduct extensive studies on the performance of SPDL with of differential privacy in social network analysis: A survey,” IEEE
Transactions on Knowledge & Data Engineering, no. 01, pp. 1–1, apr
variations of network size, batch size, privacy budget, and 5555.
Byzantine ratio. [20] D. Yu, Z. Zou, S. Chen, Y. Tao, B. Tian, W. Lv, and X. Cheng,
“Decentralized parallel sgd with privacy preservation in vehicular
networks,” IEEE Transactions on Vehicular Technology, 2021.
R EFERENCES
[1] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan,
S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggre-
gation for privacy-preserving machine learning,” in proceedings of Minghui Xu received his PhD degree in Com-
the 2017 ACM SIGSAC Conference on Computer and Communications puter Science from The George Washington Uni-
Security, 2017, pp. 1175–1191. versity in 2021, and received the BS degree in
[2] E.-M. El-Mhamdi, R. Guerraoui, A. Guirguis, L. N. Hoang, and Physics from the Beijing Normal University in
S. Rouault, “Genuinely distributed byzantine machine learning,” 2018. He is currently an Assistant Professor in
in Proceedings of the 39th Symposium on Principles of Distributed the School of Computer Science and Technol-
Computing, 2020, pp. 355–364. ogy, Shandong University, China. His current re-
[3] X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and search focuses on blockchain, distributed com-
J. Liu, “Can decentralized algorithms outperform centralized algo- puting, and applied cryptography.
rithms? a case study for decentralized parallel stochastic gradient
descent,” arXiv preprint arXiv:1705.09056, 2017.
[4] B. Liu, M. Ding, S. Shaham, W. Rahayu, F. Farokhi, and Z. Lin,
“When machine learning meets privacy: A survey and outlook,”
ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–36, 2021.
[5] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learn-
ing: Concept and applications,” ACM Transactions on Intelligent Zongrui Zou is currently working toward the
Systems and Technology (TIST), vol. 10, no. 2, pp. 1–19, 2019. under-graduate degree with the School of Com-
[6] L. Su and N. H. Vaidya, “Fault-tolerant multi-agent optimization: puter Science and Technology, Shandong Uni-
optimal iterative distributed algorithms,” in Proceedings of the 2016 versity, Qingdao, China. His research interests
ACM symposium on principles of distributed computing, 2016, pp. 425– mainly include theoretical aspects of private data
434. analysis and machine learning.
[7] D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust
distributed learning: Towards optimal statistical rates,” in Interna-
tional Conference on Machine Learning. PMLR, 2018, pp. 5650–5659.
[8] R. Guerraoui, S. Rouault et al., “The hidden vulnerability of
distributed learning in byzantium,” in International Conference on
Machine Learning. PMLR, 2018, pp. 3521–3530.
[9] R. Guerraoui, N. Gupta, R. Pinot, S. Rouault, and J. Stephan,
“Differential privacy and byzantine resilience in sgd: Do they add
up?” arXiv preprint arXiv:2102.08166, 2021.
[10] P. Bhattacharya, S. Tanwar, U. Bodke, S. Tyagi, and N. Kumar, Ye Cheng received his bachelor’s degree in
“Bindaas: Blockchain-based deep-learning as-a-service in health- mechanical engineering from Wuhan University
care 4.0 applications,” IEEE Transactions on Network Science and of Technology in 2018. He is working toward a
Engineering, 2019. master’s degree in Computer Science and Tech-
[11] Q. Hu, Z. Wang, M. Xu, and X. Cheng, “Blockchain and federated nology at Shandong University in China. His cur-
edge learning for privacy-preserving mobile crowdsensing,” IEEE rent research direction is blockchain and privacy
Internet of Things Journal, pp. 1–1, 2021. protection.
[12] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain
and federated learning for privacy-preserved data sharing in
industrial iot,” IEEE Transactions on Industrial Informatics, vol. 16,
no. 6, pp. 4177–4186, 2019.
Qin Hu received her Ph.D. degree in Computer

Science from the George Washington Univer-
sity in 2019. She is currently an Assistant Pro-
fessor with the Department of Computer and
Information Science, Indiana University-Purdue
University Indianapolis (IUPUI). Her research
interests include wireless and mobile security,
edge computing, blockchain, and crowdsourc-
ing/crowdsensing.
Dongxiao Yu received his BS degree in Math-

ematics in 2006 from Shandong University, and
PhD degree in Computer Science in 2014 from
The University of Hong Kong. He became an
associate professor in the School of Computer
Science and Technology, Huazhong University
of Science and Technology, in 2016. Currently
he is a professor at the School of Computer Sci-
ence and Technology, Shandong University. His
research interests include wireless networking,
distributed computing, and graph algorithms.
Xiuzhen Cheng received her MS and PhD de-

grees in computer science from University of
Minnesota, Twin Cities, in 2000 and 2002, re-
spectively. She was a faculty member at the
Department of Computer Science, The George
Washington University, from 2002-2020. Cur-
rently she is a professor of computer science
at Shandong University, Qingdao, China. Her
research focuses on blockchain computing, se-
curity and privacy, and Internet of Things. She is
a Fellow of IEEE.

SPDL: Blockchain-Secured and Privacy-Preserving Decentralized Learning

Uploaded by

Copyright:

Available Formats

You might also like

SPDL: Blockchain-Secured and Privacy-Preserving Decentralized Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SPDL: Blockchain-Secured and Privacy-Preserving Decentralized Learning

Uploaded by

Copyright:

Available Formats

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO.

SPDL: Blockchain-secured and

Index Terms—Decentralized Learning; Byzantine resilience; Blockchain; Privacy preservation

where γ is the learning rate. Notice that since K is deter-

of sending raw data, the risk of leaking information still TABLE 1

we then claim that A preserves (, δ )-differential-privacy, and

Fig. 2. SPDL workflow (t-th round)

7 . PREPARE to its local blockchain BCk , update its local gradient as

1.0 1.0 1.0 1.0

0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0

(a) N = 4 (b) N = 10 (c) N = 20 (d) N = 30

1.0 1.0 1.0 1.0

0.8 0.8 0.8 0.8

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0

(a) N = 4 (b) N = 10 (c) N = 20 (d) N = 30

1.0 1.0 1.0 1.0

0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0

(a) BR = 0% (b) BR = 10% (c) BR = 20% (d) BR = 30%

then make a comparison of three different schemes PURE,

Qin Hu received her Ph.D. degree in Computer

Dongxiao Yu received his BS degree in Math-

Xiuzhen Cheng received her MS and PhD de-

You might also like

we then claim that A preserves (, δ )-differential-privacy, and