Professional Documents
Culture Documents
Blockchain Benchmark Paper
Blockchain Benchmark Paper
Abstract 1 Introduction
With the growing adoption of blockchain technol-
The plethora of blockchain proposals raises the ogy, the number of readily-available solutions have
question of selecting the ideal blockchain for a given multiplied dramatically. As of March 2021, ap-
decentralised application (DApp). Most blockchain proximately five thousand distinct cryptocurren-
performance evaluations are obtained through tai- cies have been reported on a single website [3].
lored environments under unknown experimental Each of these implementations aim at offering im-
settings documented through whitepapers or blog provements through distinctive features, focused on
posts. Unfortunately, the few blockchain benchmark the performance and application to various use-
efforts typically generate workloads either from a cases. Although a number of these variants could,
central location, or, purely based on synthetic re- in theory, be running on multiple instances of the
quest patterns. same blockchain, they are often packaged as their
own standalone implementation with distinctive
We propose a benchmark framework, D IABLO, to features. A recent survey [12] highlights the breadth
reason about the performance of blockchains un- of the blockchain landscape through classifications
der realistic workloads. D IABLO is extensible, al- of blockchains, listing 8 different protocols to se-
lowing DApp programmers to define new work- lect nodes that are tasked with proposing blocks,
loads, and blockchain designers to integrate and test 13 different consensus protocols and 9 data struc-
their own blockchain. D IABLO currently supports tures to store transaction information. This diver-
6 blockchain implementations and 4 built-in work- sity illustrates a probably small subset of all possible
load DApps. Our results indicate that blockchains blockchain implementations that exist today.
with lower fault tolerance can achieve higher per- This plethora of blockchain proposals raises the
formance, confirming that speculative execution of question of which proposal is the ideal blockchain
transactions might drastically impact performance. for a particular application. Unfortunately, most of
These results also show that one implementation can these proposals are not reported in scientific pub-
offer significantly better performance than another lications. They are at best presented in the form
implementation of the same blockchain protocol. of white papers that present a 10-000-foot-view of
their implementation details. As an example, the
Ethereum yellow paper [16] presents the technical-
∗ The authors are listed in the alphabetical order of their last- ities of the Ethereum Virtual Machine but does not
name. Christopher Natoli is the primary author of this publica- explain how Ethereum participants can reach con-
tion. sensus on a unique block at a given index of the
1
Blockchain Consensus Fault tolerance
DApp Trace Feature
Go Ethereum PoA Clique Byzantine
Microblogging Twitter load burst
Open Ethereum PoA Aura Byzantine
Exchange Nasdaq herd behavior
CollaChain DBFT Byzantine
Supply chain Aviation low demand
Quorum IBFT Byzantine
Constant Synthetic tunable
Quorum Raft Crash
Hyperledger Fabric Raft Crash
Table 1: The Decentralized application (DApps) of
D IABLO and their features
Table 2: The blockchains evaluated with D IABLO
chain. In order to analyse the underlying proto- sively crash faults, (ii) OpenEthereum is consistently
cols of such blockchains, researchers typically had slower than Go Ethereum and (iii) Hyperledger
to look at the available source code before being Fabric (HLF) suffers more from contention than
able to reason about the correctness of the proto- blockchains based on traditional (non-speculative)
cols [6]. Another approach is for researchers to eval- executions.
uate blockchain as black boxes by generating work- The rest of the paper is organised as follows. In
loads and measuring their performance. In compar- Section 2 we present D IABLO. In Section 3 we
ison with the variety of blockchain proposals, very present the experimental evaluation of the existing
little evaluations have been made on blockchain and blockchains. In Section 4 we present the related
most of these evaluations [5, 2, 11, 9, 13, 4] evaluate work and we conclude in Section 5.
few blockchains on synthetic workloads that are not
representative of realistic use-cases.
In this paper, we propose the DIstributed Analyt-
2 Diablo
ical BLOckchain benchmark framework called D IABLO, This section introduces D IABLO, the DIstributed An-
to reason about the performance of blockchain un- alytical BLOckchain benchmark framework, that al-
der realistic workloads. D IABLO is extensible in that lows to compare different blockchains on the same
it allows users to defined their own Decentralised ground under realistic and tunable workloads. D I -
Application (DApp) and offers built-in workloads as ABLO is extensible in that it allows users to define
depicted in Table 1, including (i) Microblogging, more workloads and plug new blockchains easily.
featuring bursts of simple requests, (ii) Exchange, Figure 1 provides a visual representation of the
featuring herd behaviors in financial markets and components of D IABLO. D IABLO includes two main
(iii) Supply chain, featuring a Poisson distribution elements, the Primary (Section 2.1), tasked at orches-
of requests, and a (iii) Constant, featuring a tun- trating the workload and performing benchmark
able fixed rate of requests. D IABLO comes with var- flow, and the Secondary (Section 2.2), tasked at in-
ious blockchains indicated in Table 2 including Go teracting with the blockchains and collecting bench-
Ethereum or Geth, Open Ethereum (formerly known mark statistics.
as parity), CollaChain a new blockchain combin-
ing Red Belly Blockchain optimisations [4] with
Ethereum smart contracts, Quorum, the blockchain 2.1 D IABLO Primary
originally designed by J.P. Morgan and maintained
The Primary is the conductor machine of the bench-
by Consensys, and Hyperledger Fabric [1] hosted by
mark, orchestrating the timing and operations. Its
the Linux Foundation.
main function is to generate and distribute the work-
We show empirically that (i) tolerating Byzantine load data to the secondary machines, sending com-
faults is typically more costly than tolerating exclu- mands to start benchmarking, and once complete,
2
Workload Generator mation, but also standardises the way information
Benchmark Configuration
is represented across all chains, so as to facilitate the
Primary
• Workload Name Result Handler
• Number of secondaries
comparison between blockchain systems. The ob-
• Number of threads TCP Server
Blockchain tained results indicate both the performance of the
Network
• Intervals Nodes
whole system under consideration and the perfor-
Comm. Response
• Transaction Type mance of each D IABLO Secondary. This fine-grained
Chain Configuration TCP Client information about per-machine transaction latency
Secondary
• Blockchain Name and throughput is typically instrumental in identi-
Workload Handler
• Node information fying slow machines impacting the overall perfor-
(threads)
• Key information mance.
Blockchain Interface
3
test. The client interface is key to the modularity 12
tx/second made
of D IABLO, it integrates the different functionalities 1 bench : 10
2 type : " simple " 8
abstracted from the benchmark flow, making it pos- 3 txs :
6
sible to perform various actions, like connecting to 4 0: 1
4
5 10: 11
a node, sending a transaction, etc. Another task 6 20: 11 2
handled by the client interface is to obtain the re- 7 28: 6
0
0 5 10 15 20 25 30
sults, throughput and latency, and return these re- seconds
sults when asked upon by the workload handler, for
these results to be then forwarded to the Primary. Figure 2: Benchmark Configuration Example
4
modifier checklen ( string memory data ) { 1 func ( s * SmartContract ) Tweet (
require ( bytes ( data ). length <= maxlen , 2 ctx contractapi . TransactionContextInterface ,
" tweet too large " ); 3 message string , owner string ) error {
_; 4 if len ( message ) > MaxLen {
} 5 return fmt . Errorf ( " tweet too large , % v " ,
function tweet ( string memory data ) 6 len ( message ))
public checklen ( data ) { 7 }
tweets [ msg . sender ] = data ; 8 b := [] byte ( message )
emit NewTweet ( msg . sender , data ); 9 return ctx . GetStub (). PutState ( owner , b )
} 10 }
Twitter code snippet in solidity for EVM-based blockchains Twitter code snippet in golang for Hyperledger Fabric
Figure 3: The D IABLO Twitter DApp is written as a smart contract and a chain code
5
around curves indicate the minimum and maximum under high load, this is the case of Quorum-IBFT
values. when running the Microblogging application: we
can see that the throughput drops to zero and we
confirmed that some of the requests were lost as
3.2 Overall Performance Comparison we will discuss later. This is seemingly due to
In this section, we present the throughput and la- the contention induced by the burst of requests
tency of all the built-in blockchains. First, we de- generated by the Twitter workload, which exceeds
pict their throughput as a time series on different the capacity of Quorum-IBFT. Finally, Quorum-Raft
applications. Then, we describe their latency as and Hyperledger Fabric seems to offer comparable
a Cumulative Distribution Function (CDF) on the results, even if Quorum-Raft is faster at times. We
same workloads. As mentioned before, we compare will see later that this depends on the experimental
blockchains depending on their guarantees: CFT for settings and cannot be generalized to all settings.
the ones that tolerate crash failures, BFT for the ones
that tolerate Byzantine failures and PoA for the ones Latency. Figure 5 depicts the CDF latencies of all
that tolerate Byzantine failure only in synchronous blockchains when running the Exchange, the Mi-
networks. croblogging, the Synthetic DApps. As before, each
column includes the results obtained with a differ-
Throughput. Figure 4 depicts the throughput of ent application while each row includes the results
all blockchains—computed over a 1-second sliding for each group of blockchains (CFT, BFT, PoA).
window—as time elapses when running the Ex- We observe, at first glance, a large variation of
change, the Microblogging, the Synthetic applica- transaction latencies across most settings as it varies
tions. Each column includes the results obtained from seconds to minutes. In particular, we can
with a different application while each row includes see that the PoA blockchains have in general a
the results for each group of blockchains (CFT, BFT, larger latency than the CFT and BFT blockchains.
PoA). We also note that PoA-openethereum and Quorum-
The first observation is that all blockchains offer a IBFT loses some requests which explains why their
varying throughput under the exchange application, CDF never reaches 100%. In particular, the graphs in
a burst throughput under the Microblogging appli- the Twitter column shows that PoA-openethereum,
cation and a constant throughput under the Syn- PoA-geth and Quorum-IBFT loses requests due to
thetic application. This follows from the nature of contention bursts, which confirms the reason why
the rate at which these applications send requests to the throughput of Quorum-IBFT drops rapidly as
the blockchain systems: varying rate for Nasdaq, in well in Figure 4. By contrast, CollaChain does not
burst for Twitter and constant for Tunable. Note that experience any losses even under bursts, however,
the request rate of each application is represented its latency increases due to the Twitter load burst.
as a purple dashed line on each of these graphs,
which is followed closely by the throughput of most 3.3 The Impact of Contention
blockchain systems.
We observe that in all applications, the through- Figure 6 depicts the throughput and latency of Hy-
put of PoA-geth (Go Ethereum) is higher than perledger Fabric and Quorum-Raft. The workload
the throughput of PoA-openethereum (Open curve is a low constant send rate of 20 transac-
Ethereum), this indicates that Go Ethereum per- tions per second. The duration is 30 seconds. The
forms generally better. Note that we will confirm workload is purposefully very simple to showcase
this observation by investigating their respective that transaction conflicts may arise even in low ar-
latency later. Another interesting observation is rival rate situations. The contention percentage de-
that some blockchains start losing requests when notes the percentage of transactions which modify
6
Nasdaq Twitter Tunable
400
quorum-ibft quorum-ibft 400 quorum-ibft
300 collachain 1000 collachain collachain
Transactions
BFT
400
quorum-raft quorum-raft 1000 quorum-raft
300 HLFAB 1000 HLFAB HLFAB
Transactions
CFT
200 500
500
100
0 0 0
400
poa-geth poa-geth 2000 poa-geth
300 poa-openethereum 1000 poa-openethereum poa-openethereum
Transactions
PoA
200 1000
500
100
00 00 0
25 50 75 100 125 50 100 150 0 50 100 150
time (seconds) time (seconds) time (seconds)
7
Nasdaq Twitter Tunable
1.0
chain chain
Cumulative Distribution
chain
0.8 quorum-ibft quorum-ibft quorum-ibft
0.6 collachain collachain collachain
BFT
0.4
0.2
0.0
1.0
Cumulative Distribution
0.8
0.6
CFT
1.0 chain
Cumulative Distribution
chain
0.8 poa-geth poa-geth
0.6 poa-openethereum poa-openethereum
PoA
0.4 chain
poa-geth
0.2 poa-openethereum
0.00 25 50 75 100 0 25 50 75 100 125 150 0 25 50 75 100 125
latency (seconds) latency (seconds) latency (seconds)
8
the same asset.
9
centage. the U.S. Federal Aviation Administration and other
agencies. Each part must be documented with a
complete history of its ownership, use, and repairs.
3.4 The Impact of Fault Tolerance Online buying of aircraft pieces in the style of Ama-
In order to assess the inherent cost of tolerating zon thus requires trust and this is done by having
Byzantine faults we compared the performance of a trusted ledger with data integrity which enables:
the two versions of Quorum: Quorum-Raft that only tracking of the entire lifecycle of a part and all its
tolerates crash failures but cannot tolerate Byzantine previous owners and anti-counterfeit measures via
(arbitrary) failures and Quorum-IBFT that also tol- certification and persistence. Because aircraft pieces
erates Byzantine failures. Figure 7 shows the dif- are expensive and large, deals tend to be made us-
ing purchase orders and not card payments. For
users to be confident in their purchases, every pur-
250 quorum-ibft chase order is stored indefinitely on the ledger to
10
Figure 8: Aviation supply chain workload
11
corruption. Blockbench’s complexity introduces dif- step to enable correct performance even in the pres-
ficulty when extending to other blockchains or in- ence of contentious workloads. In a later paper [14],
troducing new workloads as well as moving to dis- Fast Fabric was mentioned to feature “optimizations
tributing the workload across multiple machines. [that] are not practical for a production environment
The variety of Ethereum adapted blockchains [...] there is no disk IO which is not suitable for a
motivated the development of Chainhammer [11], production setup.” XOX seems to be a complicated
a benchmark tool focused on the performance of solution to a specific use-case that may not be fre-
Ethereum-based blockchains under extremely high quently encountered.
loads. Chainhammer, unlike others, does not
follow a workload curve but provides continu-
ous high load generation, aiming to measure the 5 Conclusion
throughput in extreme situations. The design is
extremely specialised, meaning that there is little We presented D IABLO an extensible benchmarking
flexibility in modifications to support other work- tool for blockchain systems. It includes build-in
loads. Conversely, as Chainhammer is exclusively workloads that are realistic and synthetic but allows
for Ethereum, it can perform post-benchmark anal- DApps designers to add new workloads easily. We
ysis and obtain metrics on information critical to illustrated D IABLO by comparing the performance
the Ethereum infrastructure, such as transaction cost of six different blockchain implementations offering
analysis and the structure of blocks. crash fault tolerance, byzantine fault tolerance with
and without assuming synchrony.
Future work includes adding more blockchains,
Ad-hoc blockchain evaluations. Most evaluations like Algorand and Facebook’s Diem, and evaluating
that were made on blockchains are ad hoc and do the scalability and security of blockchain systems.
not aim at comparing very different blockchain de-
signs on the same ground. Previous works have, for
example, focused on permissioned blockchains as
one can find in the context of Internet of Things [9],
References
others have evaluated exclusively Byzantine fault
[1] Elli Androulaki, Artem Barger, Vita Bortnikov,
tolerant blockchains [13].
Christian Cachin, Konstantinos Christidis,
A comprehensive performance study [15] led to Angelo De Caro, David Enyeart, Christo-
the identification of bottlenecks in Hyperledger Fab- pher Ferris, Gennady Laventman, Yacov
ric v1.0 and provided guidelines to design appli- Manevich, Srinivasan Muralidharan, Chet
cations and operate the network to attain a higher Murthy, Binh Nguyen, Manish Sethi, Gari
throughput. As a result, few optimizations on the Singh, Keith Smith, Alessandro Sorniotti,
peer process have already been included in Fabric Chrysoula Stathakopoulou, Marko Vukolić,
v1.4, and hence our evaluation includes these opti- Sharon Weed Cocco, and Jason Yellick. Hyper-
misations. ledger fabric: A distributed operating system
FastFabric [8] improves over Hyperledger Fabric for permissioned blockchains. In Proceedings of
in order to reach 20,000 transactions per second. It the Thirteenth EuroSys Conference, EuroSys ’18,
features optimisations that replace the state DB with pages 30:1–30:15, 2018.
a hash table, storing blocks in a separate server,
separating the committing and endorsing peers into [2] Hyperledger caliper. https://hyperledger.
different servers, parallelly validating the transac- github.io/caliper/, Accessed:2021-05-06.
tions headers, and caching the unmarshaled blocks.
XOX Fabric [7] builds upon FastFabric and goes a [3] Coinmarketcap. https://coinmarketcap.
step further and introduces an additional execution- com/, Accessed:2021-05-06.
12
[4] Tyler Crain, Christopher Natoli, and Vincent [10] Honeywell aerospace creates online
Gramoli. Red belly: a secure, fair and scal- parts marketplace with hyperledger
able open blockchain. In Proceedings of the fabric, 2020. Accessed: 2021-05-19 -
42nd IEEE Symposium on Security and Privacy https://www.hyperledger.org/learn/
(S&P’21), May 2021. publications/honeywell-case-study.
[5] Tien Tuan Anh Dinh, Ji Wang, Gang Chen, [11] A. Krüuger. Chainhammer: Ethereum
Rui Liu, Beng Chin Ooi, and Kian-Lee Tan. benchmarking, 2017. https://github.
Blockbench: A framework for analyzing pri- com/drandreaskrueger/chainhammer,
vate blockchains. In Proceedings of the 2017 ACM Accessed:2021-05-06.
International Conference on Management of Data, [12] Christopher Natoli, Jiangshan Yu, Vincent
page 1085–1100, 2017. Gramoli, and Paulo Jorge Esteves Verı́ssimo.
Deconstructing blockchains: A comprehensive
[6] Parinya Ekparinya, Vincent Gramoli, and Guil- survey on consensus, membership and struc-
laume Jourjon. The Attack of the Clones against ture. Technical Report 1908.08316, arXiv, 2019.
Proof-of-Authority. In Proceedings of the Net- URL: http://arxiv.org/abs/1908.08316.
work and Distributed Systems Security Symposium
(NDSS’20), Feb 2020. [13] Gary Shapiro, Christopher Natoli, and Vincent
Gramoli. The performance of byzantine fault
[7] Christian Gorenflo, Lukasz Golab, and Srini- tolerant blockchains. In Proceedings of the 19th
vasan Keshav. Xox fabric: A hybrid approach IEEE International Symposium on Network Com-
to blockchain transaction execution. In 2020 puting and Applications (NCA’20), pages 1–8,
IEEE International Conference on Blockchain and Nov 2020.
Cryptocurrency (ICBC), pages 1–9, 2020. doi:
10.1109/ICBC48266.2020.9169478. [14] Parth Thakkar and Senthilnathan Natarajan.
Scaling hyperledger fabric using pipelined ex-
[8] Christian Gorenflo, Stephen Lee, Lukasz Go- ecution andsparse peers. Technical Report
lab, and Srinivasan Keshav. Fastfabric: Scal- 2003.05113, arxiv, 2020.
ing hyperledger fabric to 20,000 transactions
per second. In 2019 IEEE International Con- [15] Parth Thakkar, Senthil Nathan, and Balaji
ference on Blockchain and Cryptocurrency (ICBC), Viswanathan. Performance benchmarking and
pages 455–463, 2019. doi:10.1109/BLOC.2019. optimizing hyperledger fabric blockchain plat-
8751452. form. Technical Report 1805.11390, arXiv, 2018.
URL: http://arxiv.org/abs/1805.11390.
[9] Runchao Han, Gary Shapiro, Vincent Gramoli,
and Xiwei Xu. On the performance of dis- [16] Gavin Wood. Ethereum: A secure decentralised
tributed ledgers for internet of things. Internet generalised transaction ledger, 2015. Yellow pa-
of Things, 10, Aug 2019. per.
13