Data Integrity With Rubrik

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

TECHNICAL WHITE PAPER

Exploring the Depth of Simplicity:


Data Integrity with Rubrik
TABLE OF CONTENTS
ABSTRACT...................................................................................................................3
MEET ATLAS — DISTRIBUTED FILE SYSTEM.........................................................3
DATA & METADATA PROTECTION.......................................................................... 4
DATA PROTECTION VIA ERASURE CODING............................................................................. 4
ALWAYS FULL STRIPE WRITES.................................................................................................... 5
METADATA PROTECTION.............................................................................................................. 5

CONTINUOUS VALIDATION......................................................................................6
FINGERPRINTS...........................................................................................................6
DATA INTEGRITY IN ACTION....................................................................................7
SCENARIO 1....................................................................................................................................... 7
ONE NODE LOSS.......................................................................................................................................7

TWO HDD LOSS.........................................................................................................................................7

SCENARIO 2 ..................................................................................................................................... 8
ONE AVAILABILITY DOMAIN LOSS......................................................................................................8

ARCHITECTURE DISTINCTIVES...............................................................................8
A SELF-HEALING SOFTWARE SYSTEM...................................................................................... 8
INDEPENDENCE FROM SPECIALIZED HARDWARE............................................................... 8
CLOUD ARCHIVE RECOVERABILITY.......................................................................................... 9

CONCLUSION..............................................................................................................9
ABOUT THE AUTHORS..............................................................................................9
APPENDIX.................................................................................................................10
ADDITIONAL FAILURE SCENARIOS.......................................................................................... 10

Released April 2018 v3

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 2


ABSTRACT
Data Integrity is core to the Rubrik Cloud Data Management (CDM) architecture. Given the various facets of
CDM and in particular the way that Rubrik is used as the “storage of last resort” for backup and recovery, there
are multiple methods to ensure data integrity — all of which are transparent to our customers. The purpose of
this paper is to introduce the following concepts with respect to Rubrik CDM.

• Data and metadata protection


• Continuous validation
• Data fingerprints
• Failure scenarios
Please see the Rubrik website for an overview of Cloud Data Management.

MEET ATLAS — DISTRIBUTED FILE SYSTEM


Atlas is a distributed file system built from the ground up by the Rubrik engineering team. It marries the design
principles of modern, web-scale file systems such as Google’s file system with the ability to handle random
writes and intelligent version awareness. Conceptually, Atlas stores a set of versioned files — all protected
data is immutable. As new data enters the system, Atlas preserves older versions by writing to new blocks and
always utilizing full stripe writes. This detail is worth emphasizing especially in light of increasing ransomware
attacks — once data is written to Atlas, it is immutable. As a scale-out file system, there is never a need for
forklift upgrades nor are there deduplication silos. All communication between nodes is encrypted.

Atlas is also designed to handle the Figure 1: Rubrik Architecture


unfortunate reality that hardware is
unreliable. To mitigate against this, there
must be multiple layers of redundancy
through software — we’ll explore
those layers in this paper. This same
design principle has enabled rapid
development of Rubrik Edge — a virtual
Rubrik appliance for Remote Office/
Branch Office (ROBO) use. In a virtual
environment, there is even less visibility
into the underlying hardware.

Atlas sits underneath all of the Rubrik


CDM services and was specifically
designed with a focus on data integrity
and resiliency.

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 3


DATA & METADATA PROTECTION
As a distributed system, Rubrik is designed to tolerate even the unlikeliest of failures. This section provides an
overview of this protection.

DATA PROTECTION VIA ERASURE CODING


Rubrik has always had resiliency against dual disk failure — similar in some ways to the disk failure resilience
provided by RAID6 dual parity. However, traditional RAID architectures have become nonviable due to the
increasing capacity of disks and their steady Unrecoverable Read Error (URE) rate. With larger drives and
RAID6, rebuild times can be measured in days — dramatically increasing the risk of a single disk failure +
Unrecoverable Read Error (i.e. data corruption) during rebuild leading to the loss of an entire RAID set. We
believe this is an unacceptable risk on “storage of last resort.”
Figure 2: Erasure Coding
Instead of continuing to use RAID or RF2/RF3
mirroring with their attendant space penalties,
Rubrik uses Erasure Coding (4,2) with a specific
implementation of Reed-Solomon algorithms to
improve performance, provide resiliency, and use
space efficiently. Erasure Coding also removes
static RAID group boundaries that often require
full disk shelves to be added during expansion or
additional planning.

In the event of a disk failure, Erasure Coding


enables automatic data rebuild to return the cluster to full protection within hours where RAID6 rebuilds could
take days to weeks. Of particular note, this is done without reliance on specialized hardware such as an NVRAM
card.

As shown in Figures 3 and 4, (4,2) Erasure Coding allows Rubrik to have the same data resiliency as RAID6 or
RF3 3-way mirrors. It also allows Rubrik to be far more space efficient than either RF2’s 100% space overhead
or RF3’s 200% space overhead. From a “usable space perspective”, Erasure Coding provides ~66% usable
space.

Figure 3: Comparison of Data Protection Methods

Protection Space Raw to Rebuild Times for


Method Overhead Usable % Large Drives Failures Tolerated before Data Loss

50% Disk
RF2 Mirroring 100% Hours to Days 1 Disk or 1 Bad Block (URE)
Usable

2 Disks - (N-2
RAID6 Days to Weeks 2 Disks or 1 Disk + 1 Bad Block (URE)
~30-50% Disks) %

33% Disk
RF3 Mirroring 200% Hours 2 Disks or 1 Disk + 1 Bad Block (URE)
Usable

Erasure Coding 66% Disk


50% Hours 2 Disks or 1 Disk + 1 Bad Block (URE)
(4,2) Usable

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 4


Rubrik firmly believes that RF2 or “single disk loss resiliency” is an unacceptable risk for “storage of last resort.” If
during the lengthy rebuild of a single drive there is an unrecoverable read error, an entire data set would be lost.

Note: previous versions of Rubrik used RF3 Figure 4: Space Overhead

Mirroring to provide data integrity. RF3


Mirroring provided dual disk failure protection
but at the cost of 200% space overhead — also
known as 33% “raw to usable” space. With
Rubrik CDM version 3.0 Erasure Coding was
provided to customers at no additional charge
— a dramatic increase in usable space with the
same level of data protection as RF3 mirroring.
With the exception of the R334, all Rubrik
clusters leverage Erasure Coding.

ALWAYS FULL STRIPE WRITES


Traditional storage arrays utilizing RAID can lose data if there is a power failure during a write operation. This is
due to block writes having a window of time where a data stripe is inconsistent causing RAID6 rebuild failure
if even a single disk fails. For this reason, Atlas never updates a single block in a stripe but always does a full
stripe write with verification after the write operation.

METADATA PROTECTION
Metadata is protected in multiple ways. To enable Rubrik’s fast Google-like search, metadata is always held
on the local SSD of each Rubrik node. For resiliency, metadata is distributed within the cluster via three-way
replication as well as backed up on the local HDD (hard drives) within each node. An SSD failure will not take
the cluster metadata store offline and as a last resort the HDD copy allows quick local metadata restoration
upon SSD replacement.
Figure 5: Metadata Protection
To protect against corruption and provide
further resiliency, multiple copies of
versioned metadata are stored and
replicated throughout the system.

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 5


CONTINUOUS VALIDATION
As data enters the system, Atlas creates checksums to protect against physical disks not providing absolute
data integrity. Checksums refer to a compact signature of a larger piece of data which can be used to verify its
integrity. This is also known as CRC (cyclic redundancy check).

There are two types of checksums.

1. Stripe checksum — a checksum created at a logical level which is then persisted along with the data.
Stripe checksums are used to protect against memory corruption software bugs. The checksum
is computed at the data entry point, and Atlas ensures it remains the same even after data moves
through various distributed layers.
2. Chunk checksum — a checksum created at a physical level and persisted alongside the data. Chunk
checksums are used to protect against bit rot — the slow deterioration of data stored on disk.

Checksums are leveraged in multiple ways.

1. When data is read — as part of a read request, the checksum is validated as a check against
corruption. If checksum violations are detected, data will be automatically repaired from other copies.
2. Background scan — during normal operations, there is a continuous background scan process
which exists to look for data corruption or inconsistency. In particular, this allows recovery from
Unrecoverable Read Errors.

If data rebuild is needed, the resiliency provided by Erasure Coding is automatically leveraged in the background.

FINGERPRINTS
Fingerprinting algorithms, in which a large data item is mapped to a shorter bit string, are employed as a more
rigorous end-to-end check and created at the data management level (Cerebro as shown in Figure 1).

Fingerprints are leveraged in multiple ways.

1. Data Ingest — fingerprints are computed the first time the data enters the system. From this point
on, the system ensures that content does not change (i.e. fingerprints are validated) and always does
a fingerprint check before committing any data transformations. The fingerprints are compared and
validated during ingest to avoid duplicate data being written. This is also known as deduplication. If
fingerprint mismatch is detected, the data transformation operation is rolled back.
2. Data Replication — fingerprints are compared and validated as part of the replication process
between two Rubrik clusters to avoid duplicate data being sent over the network.
3. Data Archive — fingerprints are compared and validated as part of the cloud archive process to avoid
duplicate data being sent to and stored with a cloud provider.
4. Data Immutability — beyond the append-only nature of the Atlas filesystem, fingerprints are used to
guarantee data immutability.
5. Background Scan — similar to checksums, Rubrik runs periodic fingerprint checks to ensure data
integrity.

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 6


DATA INTEGRITY IN ACTION
Atlas uses all local storage resources (SSDs and HDDs) available across all nodes in the cluster and pools
it all together into a global namespace. As nodes are added to the cluster, the global namespace grows
automatically, thus increasing capacity and performance across the cluster. The metadata of Atlas (and
the metadata of the data management application) is stored in the metadata store, Callisto, which is also
distributed on all nodes in the SSD layer. The nodes communicate internally using remote procedure calls
presented to Atlas by the cluster management component (Forge) in a topology aware manner, thereby
granting Atlas the capability to provide efficient data locality. This is necessary to ensure that data is spread
correctly throughout the cluster for redundancy.

The system is self-healing. Forge publishes the disk and node health status and Atlas reacts. Assuming Erasure
Coding (4,2), if a node or entire appliance fails, Atlas will create a new copy of the data on another node to
make sure the requested failure tolerance is met. With (4,2) Reed Solomon Erasure Coding, Rubrik stores 2
extra bytes of coding data for every 4 bytes of data. This allows Rubrik to survive a number of failure types
with minimal overhead storage cost. Additionally, Atlas also runs a background task to check the CRC of each
data chunk to ensure what has written to Rubrik is available in time of recovery.

The following scenarios provide specific examples of how Rubrik is designed to protect against failure.
For additional failure scenarios, please see Appendix.

SCENARIO 1
In the first scenario, the customer has 3 appliances consisting of multiple nodes and is concerned whether
or not they could withstand a node failure or the failure of two drives in the same node. Each concern will be
addressed separately.

ONE NODE LOSS

Any one node of the cluster (minimum 4 nodes per cluster with Erasure Coding) can fail and the system would
continue be fully operational.

• Metadata: Remains completely accessible since at least 2 out of 3 replicas are guaranteed to be
available which is enough to attain a quorum. Writing new metadata will be possible as long there are
2 functional nodes in the cluster.
• Snapshot Data: Remains 100% accessible because decoding can occur with any 4 out of 6 blocks
available.

TWO HDD LOSS

The cluster can withstand the failure of any two HDDs across any nodes and the system would continue be
fully operational. Rubrik can withstand the simultaneous failure of any number of HDDs on the same node.

• Metadata: Remains completely accessible as it is stored on SSD, not HDD.


• Snapshot Data: Remains 100% accessible because coding data is guaranteed without loss of data or
availability. Writing new snapshot data is possible as long as there are a total of 6 functional HDDs.

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 7


SCENARIO 2
In this scenario, the customer has 4 appliances consisting of multiple nodes and is concerned whether or not
they could withstand a failure of an entire appliance (3 or 4 nodes). Rubrik leverages availability domains,
which are similar to failure domains. All nodes in an appliance belong to the same availability domain, because
it is more likely that the nodes within an appliance will become available or unavailable together.

ONE AVAILABILITY DOMAIN LOSS

All nodes within an availability domain can be simultaneously lost and the system will remain fully operational.
Each availability domain must contain at least one node, with a minimum of 4 availability domains. Additionally
all availability domains should have equal number of nodes to ensure data is fairly balanced. By default, each
appliance is its own availability domain and is automatically created.

• Metadata: Remains completely accessible as 2 of 3 metadata replicas would be available (no two
replicas will be on the same availability domain). LOCAL_QUORUM reads will work. Writing new
metadata would still work since we have LOCAL_QUORUM writes. The system is intelligent enough to
write the two replicas only to nodes in operational and accessible availability domains.
• Snapshot Data: Remains 100% accessible because at most 2 of 6 encoding blocks would be lost.

ARCHITECTURE DISTINCTIVES
Although partially discussed in the sections above, several items merit a specific focus.

A SELF-HEALING SOFTWARE SYSTEM


Rubrik Cloud Data Management (CDM) is engineered to be a self-healing software system that is resilient to
multiple node and disk failures as well as chassis failure at scale. Data ingest and management tasks—backup,
replication, archival, reporting, etc.—are distributed throughout the cluster. Each node within the cluster acts
autonomously as it executes its task assignments.

The system employs an intelligent replication scheme to distribute multiple copies of data throughout the
cluster. For example, a cluster (minimum of 4 nodes for Erasure Coding) can withstand a concurrent failure
of up to two hard disk drives. The system continues to handle data ingest and operational tasks while
simultaneously creating copies of data stored on the missing blocks to maintain replicas. As more nodes are
added to the cluster, Rubrik spreads the data across the cluster to guard against node or appliance failure.
During a node failure, data ingest and operational tasks—backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes while data is reconstructed in the background. As an example, if a
cluster contains more than 3 appliances, the cluster can withstand a loss of an entire appliance with no loss of
availability or data. Specific failure scenarios are covered in Data Integrity in Action.

INDEPENDENCE FROM SPECIALIZED HARDWARE


While Rubrik is delivered as a turnkey appliance, its Cloud Data Management software fabric is designed to be
hardware-agnostic. There is no dependence on specialized or proprietary hardware components due to the
atomic compare-and-swap data operations. For example, Rubrik’s architecture does not require an NVRAM
card (non-volatile random-access memory) to avoid single points of failure, data corruption, or handle heavy
small write traffic. In fact, Rubrik’s appliance is based on industry standard (or white label) hardware.

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 8


CLOUD ARCHIVE RECOVERABILITY
Data loss from a disaster can be crippling for any business — even one that utilizes cloud services for recovery
purposes. Rubrik ensures that data sent to the cloud is easily accessed at any time and at a granular level,
storing metadata with the data. This allows for recovery from offsite data in the event of disaster.

Even if the local cluster is erased or destroyed, Rubrik provides a second line of defense by allowing local
reconstruction from data stored in the cloud. Metadata is stored alongside data in all cloud archives. This allows
cloud archive data to be referenceable even if the primary datacenter is entirely destroyed — a specific design
goal to address scenarios we have seen in other architectures where cloud archive data became unusable in
the event of losing the “primary catalog”.

CONCLUSION
Any data management platform requires multiple lines of defense to ensure data integrity. From end-to-end
continuous checks for data and metadata to a self-healing software system that weathers node/disk failure
to an architecture that minimizes dependence on hardware, Rubrik utilizes a multifaceted system of checks to
validate the accuracy and consistency of data throughout its lifecycle.

As a next-generation solution designed to support data management across multiple locations, Rubrik ensures
data integrity even when data is stored in the public cloud and data recoverability even when local data is
destroyed.

If interested in going deeper into the Atlas File System, see this Tech Field Day 12 presentation — “Rubrik Atlas
File System with Adam Gee & Rolland Miller”.

To learn more information about Erasure Coding with Rubrik, see Rubrik Erasure Coding Architecture.

ABOUT THE AUTHORS


Andrew Miller is the Technical Marketing Manager at Rubrik. He has random certs, blogs sporadically, and has
lived on the customer and partner side for 15 years. You can find him on Twitter @andriven.

Rebecca Fitzhugh is a Technical Marketing Engineer at Rubrik. She is VCDX #243, a published author, and
blogger. You can find her on Twitter @RebeccaFitzhugh.

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 9


APPENDIX
Rubrik can withstand a number of failures at the appliance and hardware component level. An appliance
represents a group of nodes that operate independently of each other.

ADDITIONAL FAILURE SCENARIOS


The following table details what happens at each level of hardware failure.

Power Supply Unit A physical appliance offers dual power supply for redundancy. If one power
(PSU) failure supply fails, the system will failover to the remaining power supply.

Transceiver failure If a transceiver fails, the system will failover to the other transceiver (assuming
both transceivers are plugged in). This is also referred to as NIC or network
card failure.

Hard disk failure A cluster (minimum of four nodes with Erasure Coding) can withstand a
concurrent failure of up to 2 hard disk drives. The system continues to handle
data ingest and operational tasks while simultaneously creating copies of
data stored on the missing blocks to maintain (4,2) Erasure Coding. As more
nodes are added to the cluster, the system will ensure that multiple copies of
data are spread across the cluster to tolerate a node or appliance failure.

DIMM failure A DIMM failure will cause a node to be unavailable to the cluster. Removing
the offending DIMM allows the node to continue operations. However, this is
not recommended.

NIC failure An entire NIC failure within an appliance will be similar to a node failure.
Data ingest and operational tasks —backup, archival, replication, reporting,
EASE OF RECOVERY

etc.—will be redistributed to the remaining healthy nodes. Data and metadata


is rebuilt in the background to maintain (4,2) Erasure Coding. If a single
interface of the NIC fails, the system will failover to the other port (as long as
both cables are plugged in).

SSD failure A cluster impacted by SSD failure is similar to a node failure. Data ingest
and operational tasks —backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes. Data and metadata is rebuilt in
the background to maintain (4,2) Erasure Coding.

Node failure A cluster can tolerate a maximum concurrent failure of one node. Data ingest
and operational tasks —backup, archival, replication, reporting, etc.—will be
redistributed to the remaining healthy nodes. Data and metadata is rebuilt in
the background to maintain (4,2) Erasure Coding.
If a node is down longer than 24 hours, it needs Rubrik support action to be
re-added to the cluster.

Appliance/chassis Within a cluster of at least 4 appliances, a maximum of one appliance failure


failure is tolerated for Rubrik CDM 2.1 and later. Most likely, in an appliance failure, the
issue resides within the shared components such as the power supply or drive
backplane. This is expected to be a rare occurrence.
NOTE: when a cluster expands from 1 to 2+ appliances, there is a period in
which data is rebalanced to ensure the cluster is fault tolerant. During this
period, an appliance failure cannot be tolerated.

1001 Page Mill Rd Building 2 Palo Alto, CA 94306 info@rubrik.com | www.rubrik.com

TECHNICAL WHITE PAPER | DATA INTEGRITY WITH RUBRIK 10

You might also like