Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Rubrik Hardware Failure Scenarios

Rubrik can withstand a number of failures at the Brik (appliance) and hardware
component level. A Brik represents a group of nodes that operate independently of each
other.

The table below details what happens at each level of hardware failure.

Power Supply A Brik offers dual power supply for redundancy. If one power supply fails, the
Unit Failure system will failover to the remaining power supply.

A failure of both power supply units will be similar to a Brik failure (see
bottom of chart).
Transceiver If a transceiver fails, the system will failover to the other transceiver
Failure (assuming both transceivers are plugged in).

Hard Disk A cluster (minimum of three nodes) can withstand a concurrent failure of
Failure up to 2 hard disk drives. The system continues to handle data ingest and
operational tasks while simultaneously creating copies of data stored on the
missing blocks to maintain three-way replication. As more nodes are added
to the cluster, we’ll ensure that three copies of data are spread across the
cluster to tolerate a node or Brik failure.
DIMM Failure A DIMM failure will cause a node to be unavailable to the cluster. Removing
the offending DIMM can allow the node to continue operation. However, this
is not recommended.
EASE OF RECOVERY

NIC Failure An entire NIC failure within a Brik will be similar to a node failure. Data ingest
and operational tasks—backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes. Data and metadata is rebuilt in
the background to maintain three-way replication. If only a component of the
NIC fails, the system will failover to the other port with both cables plugged
in.
SSD Failure A cluster impacted by SSD failure is similar to a node failure. Data ingest
and operational tasks—backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes. Data and metadata is rebuilt in
the background to maintain three-way replication.
Node Failure A cluster can tolerate a maximum concurrent failure of one node. During a
node failure, data ingest and operational tasks—backup, archival, replication,
reporting, etc.—will be redistributed to the remaining healthy nodes. Data is
rebuilt in the background.

If a node is down for longer than 24 hours, it needs a support action to be


re-added to the cluster.
Brik/Chassis Within a cluster of at least 3 Briks, a maximum of one Brik failure is tolerated
Failure for Rubrik Converged Data Management version 2.1+. Most likely, in a Brik
failure, the issue resides within the shared components such as the power
supply or drive backplane. We expect this to be a rare occurrence.

NOTE: When a cluster expands from 1 to 2+ Briks, there is a period in which


data is re-balanced to ensure the cluster is fault tolerant. During this period,
we will not tolerate Brik failure even in a cluster size of 3+ Briks. Once the
cluster is comprised of more than 1 Brik (and re-balancing has reached
steady state), we will tolerate one Brik failure for all further Brik additions.

299 South California Ave. #250 Palo Alto, CA 94306 info@rubrik.com | www.rubrik.com

You might also like