Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Improved Raft Algorithm exploiting Federated Learning

for Private Blockchain performance enhancement

Donghee Kim Inshil Doh Kijoon Chae


2021 International Conference on Information Networking (ICOIN) | 978-1-7281-9101-0/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICOIN50884.2021.9333932

Dept. of Computer Science Dept. of Cyber Security Dept. of Computer Science


and Engineering Ewha Womans University and Engineering
Ewha Womans University Seoul, Korea Ewha Womans University
Seoul, Korea isdoh1@ewha.ac.kr Seoul, Korea
ddonghe.kim@ewhain.net kjchae@ewha.ac.kr

Abstract—According to a recent article published by Forbes, PBFT. Thus, many private blockchain such as Hyperledger
the use of enterprise blockchain applications by companies is fabric and Aergo uses Raft as their consensus algorithm.
expanding. Private blockchain, such as enterprise blockchain,
usually uses the Raft algorithm to achieve a consensus. However, Unfortunately, if the leader has network instability issues
the Raft algorithm can cause network split in unstable networks. such as packet loss, using the Raft algorithm increases the
When a network applying Raft split, the TPS(Transactions Per probability of network split, and more than half of the nodes are
Second) is decreased, which results in decreased performance for out of control of the current leader. When a network split occurs
the entire blockchain system. To reduce the probability of network in Raft, a leader election period is started in which the network
split, we select a more stable node as the next leader. To select a stops processing requests from clients. This causes increased
better leader, we propose three criteria and suggest exploiting transaction latency and reduced TPS throughput. Transaction
federated learning to evaluate them for network stability. As a latency and TPS are useful consensus performance evaluation
result, we show that blockchain consensus performance is factors affecting the general consensus time. So, block
improved by lowering the probability of network split. confirmation time is also delayed. Thus, Raft has a vulnerability
in that consensus performance is affected by the stability of the
Keywords—blockchain, consensus algorithm, Raft, leader nodes in the network [4].
election, federated learning
In this work, we propose applying federated learning to the
I. INTRODUCTION Raft algorithm to learn the factors that affect network stability
Forbes recently announced there is an increasing number of and select a new leader to minimize network split. The proposed
companies using blockchain technology in several fields, algorithm is expected to improve the performance of the Raft
including software, energy, financial services and health care algorithm by shortening block decision time and electing a node
[1]. Forbes predicted that Enterprise Blockchains, private that results in better performance based on the state of the
blockchains used by companies, will continue to evolve and lead flexible network environment.
innovation in business processes [2]. The rest of this paper is organized as follows. Section II
In blockchain technology, all nodes retain the same chain introduces the Raft algorithm and related studies. Section III
connecting blocks containing data in a peer-to-peer distributed examines the Vulnerability of the Raft algorithm in blockchain
network system. Blockchain encompasses ledger and consensus systems, and Section IV proposes a machanism to improve the
technology. Consensus algorithms maintain the consistency of a performance of blockchain consensus. We proceed to its
ledger shared by all nodes. An appropriate algorithm is used, experimental evaluation in Section V, we discuss some future
depending on the blockchain environment. There are two types directions and conclude the paper in Section VI.
of blockchain environments: public blockchain, where anyone
can participate as nodes in the network, and private blockchain, II. RELATED WORK
where only limited users can participate in the nodes. The A. Blockchain
conventional consensus algorithms used in private blockchain
Blockchain technology stores necessary data in the form of
are PBFT (Practical Byzantine Fault Tolerance) and Raft.
blocks and manages them as a connected chain based on
Because PBFT nodes send and receive commit messages among
encryption technology in P2P(Peer-to-Peer) network. Since all
themselves at each stage of the consensus, PBFT has complex
the nodes share information, reliability is guaranteed with the
communication messages [3]. This causes a scalability problem
features of integrity and transparency [4]. We review ledgers
for PBFT. In contrast, the Raft algorithm is a simple, leader-
and the consensus algorithm mechanism in which blockchain
based consensus algorithm wherein only the leader can handle
operates, as well as performance indicators.
all client requests, duplicate logs and transmit them to nodes. In
the Raft algorithm, a newly created block is directly connected 1) Ledger: The ledger shares data between the nodes
to the chain without a separate consensus process. This enables participating in a blockchain. The blockchain has a distributed
Raft to achieve consensus at a lower communication cost than ledger in which all participating nodes have their own ledger

978-1-7281-9101-0/21/$31.00 ©2021 IEEE 828 ICOIN 2021

Authorized licensed use limited to: Rutgers University. Downloaded on May 18,2021 at 12:24:35 UTC from IEEE Xplore. Restrictions apply.
data and are always synchronized [5]. Since a distributed ledger request is accepted, and the current term of follower is sent to
makes a central system unnecessary, the blockchain operates on the leader. The follower then updates the state machine with the
a distributed network system. commands contained in the log. As a result, the follower keeps
2) Consensus algorithm: Consensus refers to maintaining the sequence log and status the same as the leader. The data
the system even if there is a single failure node in a distributed structure of log is a list and log stores both the client request
network environment. In a blockchain, the consensus algorithm and term in one index. Sequential logs are maintained through
maintains the consistency of the ledger shared by all nodes. the index and term.
PoW (Proof of Work), PoS (Proof of Stake), and DPoS
(Delegated Proof of Stake) were used as initial blockchain TABLE I.  APPENDENTIRIES RPC MESSAGE
consensus algorithms. These algorithms are limited in that they Variable Contents
require a prohibitively large amount of computation. The PBFT term Leader's term
and Raft algorithm were proposed as a way to address this leaderId Leader's id
problem [6]. In the next subsection, We will explain the Raft
Index of the previous entry for the new
algorithm. prevLogIndex
entry
3) The factors assessing the performance of the blockchain Arguments
Term of the previous entry for the new
consensus algorithm: The indicators of throughput and latency prevLogTerm
entry
are generally considered when assessing the performance of a Entries[] Set of command requested by client
blockchain consensus algorithm [7][8]. leaderCommit Leader's index of last commited log
a)Transaction latency: Transaction latency refers to the term Receiver's term
time between the time a transaction is requested and finally Results Whether the receiver replicate the log
committed. Transaction latency must be short for a blockchain success
entry completely
to achieve high performance.
b)Transaction throughput: Transaction throughput TABLE II.  REQUESTVOTE RPC MESSAGE
represents the rate at which a valid transaction is processed
within a set period time and is referred to as TPS. The more Variable Contents
transactions that can be processed per second, the better, so the term Candidate's term
higher the TPS, the higher the performance. candidateId Candidate's id
Arguments
lastLogIndex Index of candidate’s last log entry
B. Raft algorithm
lastLogTerm Term of candidate’s last log entry
The types of fault in the network include a Byzantine Fault
caused by malicious nodes and a Crash Fault caused by term Receiver's term
Results
problems in hardware. The Raft algorithm tolerates for a Crash voteGranted Whether the receiver approved candidate
Fault and maintain data of the system wherein up to half of the
nodes participating in the network are acceptable even if a 2) Leader election: As shown in Fig. 1, the node is
hardware crash occurs. The Raft algorithm is a leader-based conditioned by RPC messages and time-outs. There are two
consensus algorithm that accepts requests from clients and
types of time-outs: the heartbeat time-out that the leader has,
processes requests from all nodes run by the leader. However,
because the byzantine fault is not considered, it is applied as the and the leader election time-out that the followers have. The
consensus algorithm in a private blockchain that assumes all heartbeat time-out refers to how often the leader sends an
nodes are reliable [9]. AppendEntries message. Each time the leader times out, the
leader sends the AppendEntries messages to notify its existence
Raft incorporates three mechanisms: log replication, leader and sends logs. If a follower does not receive the leader's
election, and safety to ensure that all the nodes maintain logs
AppendEntries message as per the leader election time-out
consistently. We review log replication, leader election, and
client interaction related to blockchain performance: timer, the follower decides there is a problem with the network
of the leader and transforms into a candidate. Then the network
1) Log replication: Raft divides nodes into three roles: of the Raft initiates the leader election period. The process of
leader, candidate, and follower. Raft is based on a replicated electing a leader is as follows.
state machine and ensures that each node is in one of three a)When the follower becomes a candidate, the candidate
states. The leader receives a client request, carries it out of the increases its term one and broadcasts a RequestVote RPC
log, and forwards it to other nodes. Only the leader can direct message to all nodes in the network asking for a vote.
the log and execute the commands contained in the log. The b)The RequestVote RPC includes the candidate's current
leader periodically sends RPC (Remote Protocol Call) term, candidate id, index and the term of the candidate's last log
messages with logs to the follower nodes. The RPC messages entry as Table ɛ shows. Upon receiving a RequestVote
are called AppendEntries and contain content related to the message, the followers verify if the candidate meets the criteria
leader's term, leaderÿs id, and index of leaderÿs log as Table for becoming a leader:
ɚ shows. The term refers to the time order in the Raft. When a • Candidate has a term higher than the receiver.
follower receives the AppendEntries message, the leader's

829

Authorized licensed use limited to: Rutgers University. Downloaded on May 18,2021 at 12:24:35 UTC from IEEE Xplore. Restrictions apply.
• VotedFor call storing the voted nodes is null or has the C. Federated learning
candidate’s id. Federated learning is a type of distributed machine learning
• Candidate's log is at least as up-to-date as receiver's log. that trains DNN (Deep Neural Networks) using distributed
training data [10]. Federated learning distributes the same
c)If the criteria for becoming a leader are satisfied, the learning model and gathers it on one server to create a more
receiver sends an approval message with the current term. sophisticated model to increase prediction accuracy. The
d)If more than half of the followers send an approval learning process is shown in Fig. 2 [11].
message, the candidate is elected as the leader. Otherwise, the First, all servers share the same initial learning model. Then,
candidate returns to being a follower. each server uses different data to learn the model. The cloud
At this time, the leader election timer is set at random to reduce updates the model with the average value of the parameters of
the probability that multiple candidate nodes will convert to the model that each server has learned, and again repeats the
followers at the same time. process, i.e., the servers that receive the parameters of the model
3) Client interaction: Only a leader can handle client learn them separately.
requests in the Raft. Therefore, the client must request that the III. PROBLEM ANALYSIS
leader process the command. The leader logs the client request
operation and replicates the log to the rest of the server. The In the Raft algorithm, packet loss or communication delay
causes network node instability. A network split means that the
leader also sends the client a response that the request was
leader and more than half of the nodes in the network are
successful when the command in the log is executed to commit disconnected. In other words, a network split is essentially the
it completely. As Raft algorithm starts, the client has no leader election period in Raft. Thus, an unstable network of
information about the leader of the current term, so it asks the nodes generates leader election periods frequently in the Raft
randomly selected node within the network to handle it. If the algorithm.
selected node is not a leader, it responds together with
If an agreement is reached in the blockchain by Raft, the
information about the leader of the current term, rejecting the
client cannot request a node to issue a command related to the
client request. However, if there is no leader, such as during a creation of a block containing transactions as explained in
leader election period, the request is made to a random node, as Section II, when a leader election period is triggered by a
done at the initial consensus algorithm. This process is repeated network split. This delays the block confirmation time, resulting
until the request is successful. in longer delays and fewer TPS. According to the blockchain
consensus performance evaluation method described in Section
II, frequent network split degrade blockchain consensus
performance.
To address this problem, we propose a mechanism to reduce
the probability of network split to prevent blockchain consensus
performance from being reduced. Network split occur more
frequently as the network connections among the leader and
Fig. 1. State transition in the Raft algorithm. other nodes become unstable, thus the probability of a network
split can be reduced by electing a node with a stable network as
the next node. In the next section, we describe how to select a
node with a stable network as the next leader.
IV. PROPOSED LEADER ELECTION MECHANISM FOR RAFT
The node connections in the Raft algorithm have different
network stabilities. Therefore, each node has different data to
monitor for networks with different nodes. All nodes conduct
federated learning that processes distributed learning to local
models according to their local data, and the leader synthesizes
them into a global model. Nodes can delegate agreement to other
nodes because the local model is reflected in the global model.
Moreover, federated learning can prevent centralized network
limitations such as SPOF (Single Point of Failure) from
occurring and has the advantage of reducing the storage space
burden of the server. Therefore, federated learning is more
suitable for the blockchain network environment than a learning
model on a central server.
We propose a mechanism exploiting federated learning to
elect stable leaders. Except for this leader election mechanism,
Fig. 2. Architecture for a federated learning system.
all the other mechanisms of operation are identical to those in
the traditional Raft algorithm.

830

Authorized licensed use limited to: Rutgers University. Downloaded on May 18,2021 at 12:24:35 UTC from IEEE Xplore. Restrictions apply.
A. Network Stability Evaluation Method B. Proposed Leader Election Mechanism
We use three parameters for evaluating the network stability All nodes participating in the network have network stability
of nodes in Raft as follows. with respect to the other nodes through federated learning. When
a candidate emerges, followers only vote for it if the learning
1) Number of nodes converted to candidates: If the model predicts the candidate's network is stable, ensuring that
network connection among nodes is unstable, the connection to the candidate is elected as the leader only if the network is stable.
the leader is lost, and a follower is more likely to become a As a result, the probability of network split is decreased. The
candidate. Thus, the number of candidates can be used to overall workflow of electing a leader by network stability is
determine network stability. shown in Fig. 4 and explained below.
2) Latest Diagram of the Index of Logs: A node with 1) A client request is processed through replicating and
unstable communication with the leader can’t keep its logs up to committing logs around the leader prior to the leader election
date. Thus, network stability can be determined by comparing period. Simultaneously, all the nodes conduct combined
the last index numbers of logs. learning by accumulating data on network status.
3) Leader election Time-out Value: When Raft starts, the 2) If the leader is disconnected from the follower, the
leader election time-out value of each node is initialized follower’s status is converted to ‘candidate’.
randomly. The time-out value affects the probability of a
network split. The smaller the time-out value is, the higher the 3) After voting for itself, the candidate raises one term and
probability of a network split is [4], so a time-out timer should sends the RequestVote RPC to all nodes by including the term,
also be added as a network performance evaluation parameter. last index number, and set leader election time-out value.
When a follower is changed to a candidate, the RequestVote 4) After receiving the RequestVote RPC, the follower
message is broadcasted to all nodes in the network. At this time, records the monitored data and determines if it meets the criteria
RequestVote RPC messages are sent including the term and last of the new leader as follows:
index number of the node. The leader election time-out value of
the node are added, enabling these parameters to be measured • Candidate has a term higher than receiver.
through the RequestVote message sent by the candidate node. • VotedFor is candidate id or null.
The network does not know which of the two nodes is unstable.
However, we assume that the node measuring the data is not • Candidate's log is at least as up-to-date as receiver's
problematic because the data will be aggregated through log.
federated learning. • The federated learning results for the candidate has
The parameter data, the same as the storage method of the stable network status.
logs, is stored in the list with the term. Thus, the parameter 5) If all conditions are met, the group sends a response to
measurements for each node are stored in two dimensions. The vote for the candidate to be the leader.
final data of each node is three-dimensional because each node
monitors all the nodes participating in the network. Fig. 3 shows 6) If a candidate receives a majority vote, it is converted
the data structure of monitored data. All the nodes use the data to ‘leader’. Otherwise, the candidate becomes a follower again.
to learn a federated learning model for the next leader election.
In federated learning, the local model learned by each node uses
the number of candidates for the term and the parameter
relationship to calculate the influence of the parameters on
network stability. Based on this value, the learning model
predicts whether the network of node is stable about all node in
the network by binary classification, which are reflected in the
election of the next leader. In the proposed mechanism,
federated learning ensures that the leader synthesizes the local
model and updates the global model at regular intervals.

Fig. 4. Workflow of proposed leader election mechanism

Fig. 3. Data Structure of monitored data

831

Authorized licensed use limited to: Rutgers University. Downloaded on May 18,2021 at 12:24:35 UTC from IEEE Xplore. Restrictions apply.
V. PERFORMANCE EVALUATION VI. CONCLUSION
In this section, we evaluate the performance of the proposed We propose a mechanism for electing leaders via the Raft
Raft and compare with the original Raft. Table III and IV shows algorithm considering network stability. Our method reduces the
the environment for the experiment. The implementation was probability of network split using Raft, and thus prevents further
simulated in Linux system with the Raft in python published on performance degradation when a blockchain node network is
GitHub. Federated learning is implemented using PyTorch and unstable. As future work, we will analyze the performance of the
Tensorflow Federated. In Raft, the leader election time-out proposed mechanism applied to environments such as IoT and
value is stable within the range of 150-300ms [4]. Therefore, the IoV, which is expected to be more unstable due to high mobility,
range of leader election time-out value is set 100-1200ms to and propose improved algorithms suitable for mobile
create an environment where more network split occurs. The environments. In addition, we will evaluate whether node
number of nodes used in the experiment is 9 and it is assumed instability affects the learning model synchronization of
that no membership change occurs in which a new node federated learning and study model synchronization for our
participates in the consensus algorithm during the simulation. proposed mechanism.

TABLE III.  HARDWARE SPECIFICATIONS ACKNOWLEDGMENT


Hardware Specification This work was supported by the National Research
CPU Intel i7-7700 Foundation of Korea(NRF) grant funded by the Korea
government(MSIP) (No. 2019R1F1A1063194).
RAM 8GB
GPU Nvidia GTX 1050 2GB This work was also supported by the National Research
Foundation of Korea(NRF) grant funded by the Korea
OS Linux Ubuntu 20.04.1 LTS
government(MSIP) (No. 2020R1A2C1006497).
TABLE IV.  EXPERIMENTAL ENVIRONMENT REFERENCES
Variables Value [1] "Blockchain For Business In 2020: Where Do We Stand?", Forbes, last
modified Feb 4,2020,accessed Dec 12, 2020,
Number of node 9
https://www.forbes.com/sites/forbesagencycouncil/2020/02/04/blockcha
Range of time-out 100ms – 1200ms in-for-business-in-2020-where-do-we-stand/#24a44afd6533
[2] "A 2020 And Post-Pandemic Outlook For Cryptocurrency And
Fig. 5 shows a comparison of the proposed mechanism and the Blockchain Industries", Forbes, last modified May 21, 2020, accessed
original Raft algorithm for cumulative operation period time Dec 12, 2020,
every 5 terms. The term consists of a leader election period for https://www.forbes.com/sites/forbesbusinessdevelopmentcouncil/2020/0
5/21/a-2020-and-post-pandemic-outlook-for-cryptocurrency-and-
electing a leader and an operation period for processing client blockchain-industries/#3c14973e63ab
requests. The longer the operation period time, the higher the
[3] D. Mingxiao, M. Xiaofeng, Z. Zhe, W. Xiangwei and C. Qijun, "A review
TPS in the blockchain because it can process more transactions on consensus algorithm of blockchain," 2017 IEEE International
for client requests. As can be shown in Fig. 5, the cumulative Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, 2017,
operation period time of the proposed algorithm increases as the pp. 2567-2572, doi: 10.1109/SMC.2017.8123011.
term increases and performance improvement is particularly [4] D. Huang, X. Ma and S. Zhang, "Performance Analysis of Raft Consensus
noticeable from term 20 onwards. This is because federated Algorithm for Private Blockchains," in IEEE Transactions on Systems,
learning completes learning and Raft elects a stable node as the Man, and Cybernetics: Systems, vol. 50, no. 1, pp. 172-181, Jan. 2020,
doi: 10.1109/TSMC.2019.2895471.
leader. Therefore, the proposed mechanism improves the
[5] Deshpande, Advait, et al. "Distributed Ledger Technologies/Blockchain:
blockchain performance. Challenges, opportunities and the prospects for standards." Overview
report The British Standards Institution (BSI) 40 (2017): 40.
[6] S. Pahlajani, A. Kshirsagar and V. Pachghare, "Survey on Private
Blockchain Consensus Algorithms," 2019 1st International Conference
on Innovations in Information and Communication Technology (ICIICT),
CHENNAI, India, 2019, pp. 1-6, doi: 10.1109/ICIICT1.2019.8741353.
[7] C. Fan, S. Ghaemi, H. Khazaei and P. Musilek, "Performance Evaluation
of Blockchain Systems: A Systematic Survey," in IEEE Access, vol. 8,
pp. 126927-126950, 2020, doi: 10.1109/ACCESS.2020.3006078.
[8] H. Sukhwani, N. Wang, K. S. Trivedi and A. Rindos, "Performance
Modeling of Hyperledger Fabric (Permissioned Blockchain Network),"
2018 IEEE 17th International Symposium on Network Computing and
Applications (NCA), Cambridge, MA, 2018, pp. 1-8, doi:
10.1109/NCA.2018.8548070.
[9] Ongaro, Diego, and John Ousterhout. "In search of an understandable
consensus algorithm." 2014 {USENIX} Annual Technical Conference
({USENIX}{ATC} 14). 2014.
[10] Konený, Jakub, et al. "Federated learning: Strategies for improving
communication efficiency." arXiv preprint arXiv:1610.05492 (2016).
[11] Yang, Qiang, et al. "Federated machine learning: Concept and
applications." ACM Transactions on Intelligent Systems and Technology
Fig. 5. Cumulative operation period time in Raft (TIST) 10.2 (2019): 1-19

832

Authorized licensed use limited to: Rutgers University. Downloaded on May 18,2021 at 12:24:35 UTC from IEEE Xplore. Restrictions apply.

You might also like