Consistency, Fault Tolerance, and Availability

7/7/2016 Consistency,
Fault Tolerance, and Availability Clustrix Documentation
Clustrix Documentation Clustrix v7.5 Home Distributed Database Architecture
Consistency, Fault Tolerance, and Availability
Consistency, Fault Tolerance, and Availability

This page describes how ClustrixDB architecture was designed for Consistency, Fault Tolerance, and Availability.
Consistency
Fault Tolerance See also the System Administration Guide section on Fault Tolerance and High
Permanent Loss of a Node Availability
Multi-Node Loss
Temporary Unavailability of a
Node
Front End Network Failure
Back End Network Failure
Availability
Group Membership and
Quorum
Partial Availability
Consistency
Many distributed databases have embraced eventual consistency over strong consistency because it makes such systems easier to
implement for scale. However, such systems place the burden of dealing with anomalies that may arise on the application. In other
words, eventual consistency comes at a cost of increased programming model complexity.
ClustrixDB provides a consistency model that can scale using a combination of intelligent data distribution, multi-version
concurrency control (MVCC), and Paxos. Our approach enables ClustrixDB to scale writes, scale reads in the presence of write
workloads, and provide strong ACID semantics.
For an in-depth explanation of how ClustrixDB scales reads and writes, refer to our concurrency model architecture documentation.
In brief ClustrixDB takes the following approach to consistency:
Synchronous replication within the cluster. All participating nodes must acknowledge writes before the write can
complete. Writes performed in parallel.
Paxos protocols for distributed transaction resolution.
Support for Read Committed, Repeatable Read (really Snapshot) isolation levels. Limited support for Serializeable.
MVCC allows for lockless reads; writes will not block reads.
Fault Tolerance
ClustrixDB provides fault tolerance by maintaining multiple copies of data across the cluster. The degree of fault tolerance depends
on the specified number of copies (with a minimum of two). In order to fully understand how ClustrixDB handles fault tolerance, you
must first familiarize yourself with our data distribution model.
http://docs.clustrix.com/display/CLXDOC/Consistency%2C+Fault+Tolerance%2C+and+Availability 1/4
7/7/2016 Consistency, Fault Tolerance, and Availability Clustrix Documentation
The following sections cover the possible failure cases and explain how ClustrixDB handles each situation. For this series of
examples we will assume the following:
A five-node cluster with separate front end and back end networks.
A total of five slices of data with two replicas each. The primary replica is labeled with a letter (e.g. A), while the secondary
replica is labeled with an apostrophe (e.g. A').
Permanent Loss of a Node
Consider what happens when the cluster loses node #2 to some permanent hardware failure. Data on the failed node can no longer
be accessed by the rest of the cluster. However, other nodes within the system have copies of that data.
The diagram below demonstrates how ClustrixDB recovers from the node failure. Because our data distribution model is peer to
peer and not master-slave:
Multiple nodes participate in the recovery of a single failed node. In ClustrixDB data reprotection is a many-to-many
operation. This means that clusters with more nodes can recover faster from a node failure.
Once the cluster makes new copies of slices A and C, we now have a fully protected system with two replicas of each slice. The
cluster can now handle another failure without having to replace the failed node.
Figure: Handling a complete node failure. The failed state (left) and recovered state (right).
MultiNode Loss
ClustrixDB can tolerate k -1 simultaneous nodes lost where k is the number of replicas. For example, in systems configured with 3
replicas, ClustrixDB can sustain the simultaneous loss of 2 nodes. However, the extra level of protection comes at the cost of extra
storage space in the cluster.
Temporary Unavailability of a Node
In some circumstances, a node can exit quorum for a brief period of time. A kernel panic or some sort of power event can cause
such conditions. In those cases, we expect the node to return to quorum much faster than it would take to reprotect the entire data
set. ClustrixDB provides special handling for this failure scenario by setting up a log of changes for the data on the unavailable node
that is replayed when the node re-joins the cluster.
In the example below, node #4 temporarily leaves the cluster. Nodes #2 and #5 set up logs (called Queues) to track changes to data
on node #4. Once node #4 returns to the cluster, it re-plays the logged changes. All of these operations are handled automatically by
the database.
Figure: Handling a transient node failure. The failed state (left) and recovered state (right).
Front End Network Failure
ClustrixDB supports configuring a redundant front end network, which greatly reduces the chance of a complete front end network
failure. In the event that such a failure does occur, the failed node will continue to participate in query resolution. However, it will
not be able to accept incoming connections from the application.
Back End Network Failure
ClustrixDB does not distinguish between a back end network failure and a node failure. If a node cannot communicate with its peers
on the back end network it will fail a heartbeat check and the system will consider the node offline.
Availability
ClustrixDB can provide availability even in the face of failure. In order to provide full availability, ClustrixDB requires that:
A majority of nodes are able to form a cluster (i.e. quorum requirement).

The available nodes hold at least one replica for each set of data.
In order to understand ClustrixDB's availability modes and failure cases, it is necessary to understand our group membership
protocol.
Group Membership and Quorum
ClustrixDB uses a distributed group membership protocol. The protocol maintains two fundamental sets: (a) the static set of all
nodes known to the cluster, and (b) the set of nodes which can currently communicate with each other.
The cluster cannot form unless more than half the nodes in the static membership are able to communicate with each other. In
other words, ClustrixDB will not tolerate a split-brain scenario.
For example, if a six-node cluster experiences a network partition resulting in two sets of three nodes, ClustrixDB will refuse to form
a cluster.
However, if more than half the nodes are able to communicate, ClustrixDB will form a cluster with the majority set.
Partial Availability
In the above example, ClustrixDB formed a cluster because it could form a quorum. However, it is possible that such a cluster could
offer only partial availability. In other words, the cluster may not have access to the complete data set.
In the following example, ClustrixDB maintains two replicas. However, both nodes holding replicas for A are not able to participate in
the cluster (due to some failure or other condition rendering them unavailable). When a transaction attempts to access data on slice
A, the database will generate an error that will surface to the application.

Consistency, Fault Tolerance, and Availability - Clustrix Documentation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Consistency, Fault Tolerance, and Availability - Clustrix Documentation

Uploaded by

Copyright:

Available Formats

7/7/2016 Consistency,

Clustrix Documentation Clustrix v7.5 Home Distributed Database Architecture

Consistency, Fault Tolerance, and Availability

In brief ClustrixDB takes the following approach to consistency:

A majority of nodes are able to form a cluster (i.e. quorum requirement).

You might also like