Professional Documents
Culture Documents
Review On Byzantine Fault Tolerant System
Review On Byzantine Fault Tolerant System
On
Making Byzantine Fault Tolerant Systems
Tolerate Byzantine Faults
Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin
The University of Texas at Austin
Mirco Marchetti
The University of Modena and Reggio Emilia
A reliable computer system must be able to cope with the failure of one or more of its
components. A failed component may result in any arbitrary faults that occur during the
execution of an algorithm by a distributed system. Such arbitrary faults are called Byzantine
Faults. It encompasses both omission failures (e.g., crash failures, failing to receive a request
or send a response) and commission failures (e.g., processing a request incorrectly, corrupting
local state, and/or sending an incorrect or inconsistent response to a request). When a
Byzantine failure has occurred, the system may respond in any unpredictable way, unless it is
designed to have Byzantine fault tolerance.
Since the effect of Byzantine faults are unpredictable, the identification of the problem, its
root cause analysis and solution may require long time resulting in significant impact on the
service and performance of the system. Many Byzantine fault tolerant (BFT) state machine
replication protocols like TQ/U, HQ, and Zyzzyva have been recently developed. The authors
have observed that although these protocols have high performance and are cost effective,
they dont tolerate Byzantine faults very well. A single faulty client or server is capable of
rendering these protocols virtually unusable.
So, this paper advocates a new approach, robust BFT (RBFT), to building BFT systems that
offer acceptable and predictable performance under the broadest possible set of
circumstances-including when faults occur rather than constructing high-strung systems that
only maximize best case performance. RBFT explicitly considers performance during both
gracious intervals-when the network is synchronous, replicas are timely and fault-free, and
clients correct and uncivil execution intervals in which network links and correct servers
are timely, but up to f = [