Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Review

On
Making Byzantine Fault Tolerant Systems
Tolerate Byzantine Faults

Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin
The University of Texas at Austin
Mirco Marchetti
The University of Modena and Reggio Emilia


A reliable computer system must be able to cope with the failure of one or more of its
components. A failed component may result in any arbitrary faults that occur during the
execution of an algorithm by a distributed system. Such arbitrary faults are called Byzantine
Faults. It encompasses both omission failures (e.g., crash failures, failing to receive a request
or send a response) and commission failures (e.g., processing a request incorrectly, corrupting
local state, and/or sending an incorrect or inconsistent response to a request). When a
Byzantine failure has occurred, the system may respond in any unpredictable way, unless it is
designed to have Byzantine fault tolerance.

Since the effect of Byzantine faults are unpredictable, the identification of the problem, its
root cause analysis and solution may require long time resulting in significant impact on the
service and performance of the system. Many Byzantine fault tolerant (BFT) state machine
replication protocols like TQ/U, HQ, and Zyzzyva have been recently developed. The authors
have observed that although these protocols have high performance and are cost effective,
they dont tolerate Byzantine faults very well. A single faulty client or server is capable of
rendering these protocols virtually unusable.

So, this paper advocates a new approach, robust BFT (RBFT), to building BFT systems that
offer acceptable and predictable performance under the broadest possible set of
circumstances-including when faults occur rather than constructing high-strung systems that
only maximize best case performance. RBFT explicitly considers performance during both
gracious intervals-when the network is synchronous, replicas are timely and fault-free, and
clients correct and uncivil execution intervals in which network links and correct servers
are timely, but up to f = [

] servers and any number of clients are faulty.



The paper demonstrates that existing BFT protocols are dangerously fragile, and defines a set
of principles for constructing BFT services that remain useful even when Byzantine faults
occur, and apply these principles to construct a new protocol, Aardvark to show that robust
fault tolerant systems can be built. The authors have shown that Aardvark can achieve peak
performance within 40% of that of the best existing protocol in their tests and provide a
significant fraction of that performance when up to f servers and any number of clients are
faulty.

Aardvark utilizes digital signatures for authentication, performs regular view changes, and
utilizes point to point communication, even though use of signatures instead of MACs is
susceptible to performance bottleneck, regular view changes prevent the system from doing
useful work and renouncing IP-multicast gives up throughput deliberately. The main reason
behind this is to focus more on overall robustness of the system rather than only high
performance.

The Aardvark protocol consists of 3 stages: client request transmission, replica agreement,
and primary view change. This is the same basic structure of PBFT (Practical BFT) and its
direct descendants but revisited with the goal of achieving an execution path that satisfies the
properties: acceptable performance, ease of implementation, and robustness against
Byzantine disruptions. To avoid the pitfalls of fragile optimizations, the paper focuses at each
stage of the protocol on how faulty nodes, by varying both the nature and the rate of their
actions and omissions, can limit the ability of correct nodes to perform in a timely fashion
what the protocol requires of them. This systematic methodology has led to the three main
design differences between Aardvark and previous BFT systems: (1) signed client requests,
(2) resource isolation, and (3) regular view changes.

Use of digital signatures to authenticate Aardvark client request provide non-repudiation and
ensure that all correct replicas make identical decisions about the validity of each client
request instead of using weaker (though faster) message authentication code (MAC)
authenticators. But as digital signatures are expensive to use, Aardvark uses them only for
client requests where it is possible to push the expensive act of generating the signature onto
the client while leaving the servers with the less expensive verification operation. Primary-to-
replica, replica-to-replica, and replica-to-client communication rely on MAC authenticators.

Aardvark explicitly isolates network and computational resources. This step prevents a faulty
server from interfering with the timely delivery of messages from good servers. It also allows
a node to defend itself against brute force denial of service attacks by disabling the offending
NIC. However, using physically separate NICs for communication between each pair of
servers incurs a performance hit, as Aardvark can no longer use hardware multicast to
optimize all-to-all communication.

To prevent a primary from achieving tenure and exerting absolute control on system
throughput, Aardvark invokes the view change operation on a regular basis. Replicas monitor
the performance of the current primary, slowly raising the level of minimal acceptable
throughput. If the current primary fails to provide the required throughput, replicas initiate a
view change.

The quorum-driven nature of server-initiated communication ensures that a single faulty
replica is unable to force the system into undesirable execution paths. And to guard against
denial of service, Aardvark processes each client request by passing it through a sequence of
increasingly expensive steps like performing Blacklist check, MAC check, Sequence check
and Redundancy check. Each step serves as a filter, so that more expensive steps are
performed less often.

Aardvark leverages separate work queues for providing client requests and replica to replica
communication to limit the fraction of replica resources that clients are able to consume,
ensuring that a flood of client requests is unable to prevent replicas from making progress on
requests already received. Of course, as in a non-BFT service, malicious clients can still deny
service to other clients by flooding the network between clients and replicas. Defending
against these attacks is an area of active independent research. The authors have shown the
implementation of Aardvark on dual core machines with one core verifying client requests
and the second running the replica protocol. This explicit assignment allows resource
isolation and take advantage of parallelism to partially mask the additional costs of signature
verification.

The authors have shown the behavior and performance of Aardvark and existing BFT
systems in different faulty environments by constructing a set of workloads that are close to
the worst case systems such as faulty clients, network flooding, and faulty primary and
replicas. In all faulty environments, Aardvark proved to be robust with very less deviation
from the normal behavior (e.g. throughput) while performance of other systems decreased
significantly indicating the frailty in the existing systems.

The high assurance systems require BFT protocols that are more robust to failures than
existing systems and provide adequate throughput during uncivil intervals in which the
network is well behaved but an unknown number of clients and up to f servers are faulty.
Aardvark is one such BFT state machine protocol designed and implemented to provide good
performance during uncivil executions giving up some throughput during gracious
executions.

Existing BFT protocols that have high performance during normal situations but poor results
during Byzantine faults and RBFT (Aardvark) provides significant improvement of
performance during byzantine faults but average performance during normal conditions. So,
within these boundaries of fault tolerance and performance, the engineering of BFT protocols
should embrace Lampsons well-known recommendation: Handle normal and worst case
separately as a rule because the requirements for the two are quite different. The normal case
must be fast. The worst case must make some progress.

Aardvark is far from being the final product but it has provided the solid evidence that we
need to shift our single minded focus from aggressively tuning BFT systems for the best case
of gracious execution, to develop the more balanced fault tolerant protocols that have high
performance as well as more robust and fault tolerant capabilities. Specific challenges include
formally verifying the design and implementations of BFT systems, developing a notion of
optimality for robust BFT systems that captures the fundamental tradeoffs between fault-free
and fault-full performance, and extending BFT replication to deployable large scale
applications.

You might also like