Professional Documents
Culture Documents
Data Integrity With Rubrik
Data Integrity With Rubrik
Data Integrity With Rubrik
CONTINUOUS VALIDATION......................................................................................6
FINGERPRINTS...........................................................................................................6
DATA INTEGRITY IN ACTION....................................................................................7
SCENARIO 1....................................................................................................................................... 7
ONE NODE LOSS.......................................................................................................................................7
SCENARIO 2 ..................................................................................................................................... 8
ONE AVAILABILITY DOMAIN LOSS......................................................................................................8
ARCHITECTURE DISTINCTIVES...............................................................................8
A SELF-HEALING SOFTWARE SYSTEM...................................................................................... 8
INDEPENDENCE FROM SPECIALIZED HARDWARE............................................................... 8
CLOUD ARCHIVE RECOVERABILITY.......................................................................................... 9
CONCLUSION..............................................................................................................9
ABOUT THE AUTHORS..............................................................................................9
APPENDIX.................................................................................................................10
ADDITIONAL FAILURE SCENARIOS.......................................................................................... 10
As shown in Figures 3 and 4, (4,2) Erasure Coding allows Rubrik to have the same data resiliency as RAID6 or
RF3 3-way mirrors. It also allows Rubrik to be far more space efficient than either RF2’s 100% space overhead
or RF3’s 200% space overhead. From a “usable space perspective”, Erasure Coding provides ~66% usable
space.
50% Disk
RF2 Mirroring 100% Hours to Days 1 Disk or 1 Bad Block (URE)
Usable
2 Disks - (N-2
RAID6 Days to Weeks 2 Disks or 1 Disk + 1 Bad Block (URE)
~30-50% Disks) %
33% Disk
RF3 Mirroring 200% Hours 2 Disks or 1 Disk + 1 Bad Block (URE)
Usable
METADATA PROTECTION
Metadata is protected in multiple ways. To enable Rubrik’s fast Google-like search, metadata is always held
on the local SSD of each Rubrik node. For resiliency, metadata is distributed within the cluster via three-way
replication as well as backed up on the local HDD (hard drives) within each node. An SSD failure will not take
the cluster metadata store offline and as a last resort the HDD copy allows quick local metadata restoration
upon SSD replacement.
Figure 5: Metadata Protection
To protect against corruption and provide
further resiliency, multiple copies of
versioned metadata are stored and
replicated throughout the system.
1. Stripe checksum — a checksum created at a logical level which is then persisted along with the data.
Stripe checksums are used to protect against memory corruption software bugs. The checksum
is computed at the data entry point, and Atlas ensures it remains the same even after data moves
through various distributed layers.
2. Chunk checksum — a checksum created at a physical level and persisted alongside the data. Chunk
checksums are used to protect against bit rot — the slow deterioration of data stored on disk.
1. When data is read — as part of a read request, the checksum is validated as a check against
corruption. If checksum violations are detected, data will be automatically repaired from other copies.
2. Background scan — during normal operations, there is a continuous background scan process
which exists to look for data corruption or inconsistency. In particular, this allows recovery from
Unrecoverable Read Errors.
If data rebuild is needed, the resiliency provided by Erasure Coding is automatically leveraged in the background.
FINGERPRINTS
Fingerprinting algorithms, in which a large data item is mapped to a shorter bit string, are employed as a more
rigorous end-to-end check and created at the data management level (Cerebro as shown in Figure 1).
1. Data Ingest — fingerprints are computed the first time the data enters the system. From this point
on, the system ensures that content does not change (i.e. fingerprints are validated) and always does
a fingerprint check before committing any data transformations. The fingerprints are compared and
validated during ingest to avoid duplicate data being written. This is also known as deduplication. If
fingerprint mismatch is detected, the data transformation operation is rolled back.
2. Data Replication — fingerprints are compared and validated as part of the replication process
between two Rubrik clusters to avoid duplicate data being sent over the network.
3. Data Archive — fingerprints are compared and validated as part of the cloud archive process to avoid
duplicate data being sent to and stored with a cloud provider.
4. Data Immutability — beyond the append-only nature of the Atlas filesystem, fingerprints are used to
guarantee data immutability.
5. Background Scan — similar to checksums, Rubrik runs periodic fingerprint checks to ensure data
integrity.
The system is self-healing. Forge publishes the disk and node health status and Atlas reacts. Assuming Erasure
Coding (4,2), if a node or entire appliance fails, Atlas will create a new copy of the data on another node to
make sure the requested failure tolerance is met. With (4,2) Reed Solomon Erasure Coding, Rubrik stores 2
extra bytes of coding data for every 4 bytes of data. This allows Rubrik to survive a number of failure types
with minimal overhead storage cost. Additionally, Atlas also runs a background task to check the CRC of each
data chunk to ensure what has written to Rubrik is available in time of recovery.
The following scenarios provide specific examples of how Rubrik is designed to protect against failure.
For additional failure scenarios, please see Appendix.
SCENARIO 1
In the first scenario, the customer has 3 appliances consisting of multiple nodes and is concerned whether
or not they could withstand a node failure or the failure of two drives in the same node. Each concern will be
addressed separately.
Any one node of the cluster (minimum 4 nodes per cluster with Erasure Coding) can fail and the system would
continue be fully operational.
• Metadata: Remains completely accessible since at least 2 out of 3 replicas are guaranteed to be
available which is enough to attain a quorum. Writing new metadata will be possible as long there are
2 functional nodes in the cluster.
• Snapshot Data: Remains 100% accessible because decoding can occur with any 4 out of 6 blocks
available.
The cluster can withstand the failure of any two HDDs across any nodes and the system would continue be
fully operational. Rubrik can withstand the simultaneous failure of any number of HDDs on the same node.
All nodes within an availability domain can be simultaneously lost and the system will remain fully operational.
Each availability domain must contain at least one node, with a minimum of 4 availability domains. Additionally
all availability domains should have equal number of nodes to ensure data is fairly balanced. By default, each
appliance is its own availability domain and is automatically created.
• Metadata: Remains completely accessible as 2 of 3 metadata replicas would be available (no two
replicas will be on the same availability domain). LOCAL_QUORUM reads will work. Writing new
metadata would still work since we have LOCAL_QUORUM writes. The system is intelligent enough to
write the two replicas only to nodes in operational and accessible availability domains.
• Snapshot Data: Remains 100% accessible because at most 2 of 6 encoding blocks would be lost.
ARCHITECTURE DISTINCTIVES
Although partially discussed in the sections above, several items merit a specific focus.
The system employs an intelligent replication scheme to distribute multiple copies of data throughout the
cluster. For example, a cluster (minimum of 4 nodes for Erasure Coding) can withstand a concurrent failure
of up to two hard disk drives. The system continues to handle data ingest and operational tasks while
simultaneously creating copies of data stored on the missing blocks to maintain replicas. As more nodes are
added to the cluster, Rubrik spreads the data across the cluster to guard against node or appliance failure.
During a node failure, data ingest and operational tasks—backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes while data is reconstructed in the background. As an example, if a
cluster contains more than 3 appliances, the cluster can withstand a loss of an entire appliance with no loss of
availability or data. Specific failure scenarios are covered in Data Integrity in Action.
Even if the local cluster is erased or destroyed, Rubrik provides a second line of defense by allowing local
reconstruction from data stored in the cloud. Metadata is stored alongside data in all cloud archives. This allows
cloud archive data to be referenceable even if the primary datacenter is entirely destroyed — a specific design
goal to address scenarios we have seen in other architectures where cloud archive data became unusable in
the event of losing the “primary catalog”.
CONCLUSION
Any data management platform requires multiple lines of defense to ensure data integrity. From end-to-end
continuous checks for data and metadata to a self-healing software system that weathers node/disk failure
to an architecture that minimizes dependence on hardware, Rubrik utilizes a multifaceted system of checks to
validate the accuracy and consistency of data throughout its lifecycle.
As a next-generation solution designed to support data management across multiple locations, Rubrik ensures
data integrity even when data is stored in the public cloud and data recoverability even when local data is
destroyed.
If interested in going deeper into the Atlas File System, see this Tech Field Day 12 presentation — “Rubrik Atlas
File System with Adam Gee & Rolland Miller”.
To learn more information about Erasure Coding with Rubrik, see Rubrik Erasure Coding Architecture.
Rebecca Fitzhugh is a Technical Marketing Engineer at Rubrik. She is VCDX #243, a published author, and
blogger. You can find her on Twitter @RebeccaFitzhugh.
Power Supply Unit A physical appliance offers dual power supply for redundancy. If one power
(PSU) failure supply fails, the system will failover to the remaining power supply.
Transceiver failure If a transceiver fails, the system will failover to the other transceiver (assuming
both transceivers are plugged in). This is also referred to as NIC or network
card failure.
Hard disk failure A cluster (minimum of four nodes with Erasure Coding) can withstand a
concurrent failure of up to 2 hard disk drives. The system continues to handle
data ingest and operational tasks while simultaneously creating copies of
data stored on the missing blocks to maintain (4,2) Erasure Coding. As more
nodes are added to the cluster, the system will ensure that multiple copies of
data are spread across the cluster to tolerate a node or appliance failure.
DIMM failure A DIMM failure will cause a node to be unavailable to the cluster. Removing
the offending DIMM allows the node to continue operations. However, this is
not recommended.
NIC failure An entire NIC failure within an appliance will be similar to a node failure.
Data ingest and operational tasks —backup, archival, replication, reporting,
EASE OF RECOVERY
SSD failure A cluster impacted by SSD failure is similar to a node failure. Data ingest
and operational tasks —backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes. Data and metadata is rebuilt in
the background to maintain (4,2) Erasure Coding.
Node failure A cluster can tolerate a maximum concurrent failure of one node. Data ingest
and operational tasks —backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes. Data and metadata is rebuilt in
the background to maintain (4,2) Erasure Coding.
If a node is down longer than 24 hours, it needs Rubrik support action to be
re-added to the cluster.