Professional Documents
Culture Documents
Rights Reserved. 1
Rights Reserved. 1
Rights Reserved. 1
movement of data copies to a safe site using minimum WAN bandwidth. This ensures fast recovery in
case of loss of the primary data, the primary site or the secondary store.
This lesson is an overview of Data Domain replication types and topologies, configuring, and seeding
replication.
Data Domain systems are used to store backup data onsite for a short period such as 30, 60 or 90 days, depending on local practices and
capacity. Lost or corrupted files are recovered easily from the onsite Data Domain system since it is disk-based, and files are easy to locate
and read at any time.
In the case of a disaster that destroys the onsite data, the offsite replica is used to restore operations. Data on the replica is immediately
available for use by systems in the disaster recovery facility. When a Data Domain system at the main site is repaired or replaced, the data
can be recovered using a few simple recovery configuration and initiation commands.
You can quickly move data offsite (with no delays in copying and moving tapes). You dont have to complete replication for backups to
occur. Replication occurs in real time.
Replication typically consists of a source Data Domain system (which receives data from a backup system), and one or more destination
Data Domain systems.
Replication duplicates backed-up data over a WAN after it has been deduplicated and compressed. Replication creates a logical copy of the
selected source data post-deduplication, and only sends any segments that do not already exist on the destination. Network demands are
reduced during replication because only unique data segments are sent over the network.
Replication provides a secondary copy replicated (usually) to an offsite location for:
Disaster recovery
Defining a replication source and destination is called a pair. A source or a destination in the
replication pair is referred to as a context. The context is defined in both the source and destination
Data Domain systems paired for replication.
A replication context can also be termed a replication stream, and although the use case is quite
different, the stream resource utilization within a Data Domain system is roughly equivalent to a read
stream (for a source context) or a write stream (for a destination context).
The count of replication streams per system depends upon the processing power of the Data Domain
system on which they are created. Lesser systems can handle no more than 15 source and 20
destination streams, while the most powerful Data Domain system can handle over 200 streams.
Data Domain supports various replication topologies in which data flows from a source to a destination
directory over a LAN or WAN.
One-to-one replication
The simplest type of replication is from a Data Domain source system to a Data Domain destination
system, otherwise known as a one-to-one replication pair. This replication topology can be configured
with directory, MTree, or collection replication types.
Bi-directional replication
In a bi-directional replication pair, data from a directory or MTree on System A is replicated to System
B, and from another directory or MTree on System B to System A.
One-to-many replication
In one-to-many replication data flows from a source directory or MTree on a System A to several
destination systems. You could use this type of replication to create more than two copies for
increased data protection, or to distribute data for multi-site usage.
Many-to-one replication
In many-to-one replication, whether with MTree or directory, replication data flows from several
source systems to a single destination system. This type of replication can be used to provide data
recovery protection for several branch offices at the corporate headquarters IT systems.
Cascaded replication
In a cascaded replication topology, a source directory or MTree is chained among three Data Domain
systems. The last hop in the chain can be configured as collection, MTree, or directory replication,
depending on whether the source is directory or MTree.
For example, the first DD system replicates one or more MTrees to a second DD system, which then
replicates those MTrees to a final DD system. The MTrees on the second DD system are both a
destination (from the first DD system) and a source (to the final DD system). Data recovery can be
performed from the non-degraded replication pair context.
Copyright 2013 EMC Corporation. All rights reserved
Data Domain Replicator software offers four replication types that leverage the different logical levels
of the system described in the previous slide for different effects.
Directory replication: A subdirectory under /backup/ and all files and directories below it on a
source system replicates to a destination directory on a different Data Domain system. This
transfers only the deduplicated changes of any file or subdirectory within the selected Data
Domain file system directory.
MTree replication: This is used to replicate MTrees between Data Domain systems. It uses
the same WAN deduplication mechanism as used by directory replication to avoid sending
redundant data across the network. The use of snapshots ensures that the data on the
destination is always a point-in-time copy of the source with file consistency, while reducing
replication churn, thus making WAN use more efficient. Replicating individual directories
under an MTree is not permitted with this type.
A fourth type, managed replication, belongs to Data Domain Boost operations and will be
discussed later in this course.
Collection replication replicates the entire /data/col1 area from a source Data Domain system to a
destination Data Domain system. Collection replication uses the logging file system structure to track
replication. Transferring data in this way means simply comparing the heads of the source and
destination logs, and catching-up, one container at a time, as shown in this diagram. If collection
replication lags behind, it continues until it catches up.
The Data Domain system to be used as the collection replication destination must be empty before
configuring replication. Once replication is configured, the destination system is dedicated to receive
data only from the source system.
With collection replication, all user accounts and passwords are replicated from the source to the
destination. If the Data Domain system is a source for collection replication, snapshots are also
replicated.
Collection replication is the fastest and lightest type of replication offered by the DD OS. There is no
on-going negotiation between the systems regarding what to send. Collection replication is mostly
unaware of the boundaries between files. Replication operates on segment locality containers that
are sent after they are closed.
Because there is only one collection per Data Domain system, this is specifically an approach to
system mirroring. Collection replication is the only form of replication used for true disaster recovery.
The destination system cannot be shared for other roles. It is read-only and shows data only from one
source. After the data is on the destination, it is immediately visible for recovery.
Collection replication replicates the entire /data/col1 area from a source Data Domain system to a destination
Data Domain system. This is useful when all the contents being written to the DD system need to be protected
at a secondary site.
The Data Domain system to be used as the collection replication destination must be empty before configuring
replication. The destination immediately offers all backed up data, as a read-only mirror, after it is replicated
from the source.
Snapshots cannot be created on the destination of a collection replication because the destination is read-only.
With collection replication, all user accounts and passwords are replicated from the source to the destination.
Data Domain Replicator software can be used with the optional Encryption of Data at Rest feature, enabling
encrypted data to be replicated using collection replication. Collection replication requires the source and
target to have the exact same encryption configuration because the target is expected to be an exact replica of
the source data. In particular, the encryption feature must be turned on or off at both source and target and if
the feature is turned on, then the encryption algorithm and the system passphrases must also match. The
parameters are checked during the replication association phase. During collection replication, the source
system transmits the encrypted user data along with the encrypted system encryption key. The data can be
recovered at the target, because the target machine has the same passphrase and the same system encryption
key.
Collection replication topologies can be configured in the following ways.
One-to-One Replication: This topology can be used with collection replication where the entire
/backup directory from a source Data Domain system is mirrored to a destination Data Domain
system. Other than receiving data from the source, the destination is a read-only system.
Cascaded Replication: In a cascaded replication topology, directory replication is chained among three
or more Data Domain systems. The last system in the chain can be configured as collection replication.
Data recovery can be performed from the non-degraded replication pair context.
During directory replication, a Data Domain system can perform normal backup and restore operations. A
destination Data Domain system must have available storage capacity that is at least the post-compressed size
of the expected maximum size of the source directory. In a directory replication pair, the destination is always
read-only. In order to write to the destination outside of replication, you must first break replication.
When replication is initialized, a destination directory is created automatically if it does not already exist. After
replication is initialized, ownership and permissions of the destination directory are always identical to those of
the source directory.
Directory replication can receive backups from both CIFS and NFS clients, but cannot not mix CIFS and NFS data
in same directory.
Directory replication supports encryption and retention lock.
10
MTree replication enables the creation of disaster recovery copies of MTrees at a secondary location by the
/data/col1/mtree pathname. A Data Domain system can simultaneously be the source of some replication contexts and the
destination for other contexts. The Data Domain system can also receive data from backup and archive applications while it
is replicating data.
One fundamental difference between MTree replication and directory replication is the method used for determining what
needs to be replicated between the source and destination. MTree replication creates periodic snapshots at the source and
transmits the differences between two consecutive snapshots to the destination. At the destination Data Domain system,
the latest snapshot is not exposed until all of the data for that snapshot is received. This ensures the destination is always a
point-in-time image of the source Data Domain system. In addition, files do not show out of order at the destination. This
provides file-level consistency, simplifying recovery procedures. It also reduces recovery time objectives (RTOs). Users are
also able to create a snapshot at the source Data Domain system for application consistency (for example, after a
completion of a backup), which is replicated on the destination where the data can be used for disaster recovery.
MTree replication shares some common features with directory replication. It uses the same WAN deduplication
mechanism as used by directory replication to avoid sending redundant data across the network. It also supports the same
topologies that directory replication supports. Additionally, you can have directory and MTree contexts on the same pair of
systems.
The destination of the replication pair is read-only.
The destination must have sufficient available storage to avoid replication failures.
CIFS and NFS clients should not be used within the same MTree.
MTree replication duplicates data for an MTree specified by the /data/col1/mtree pathname including the destination
MTree.
Some replication command options with MTree replication may target a single replication pair (source and destination
directories) or may target all pairs that have a source or destination on the Data Domain system.
MTree replication is usable with encryption and Data Domain Retention Lock Compliance on an MTree-level at the source
that is replicated to the destination.
Copyright 2013 EMC Corporation. All rights reserved
11
A destination Data Domain system must have available storage capacity that is at least the
post-compressed size of the expected maximum size of the source MTree.
A destination Data Domain system can receive backups from both CIFS clients and NFS clients
as long as they are separate.
MTree replication can receive backups from both CIFS and NFS clients each in their own
replication pair. (But not in the same MTree.)
After replication is initialized, ownership and permissions of the destination MTree are
always identical to those of the source MTree.
At any time, due to differences in global compression, the source and destination MTree can
differ in size.
12
Replication is a major feature that takes advantage of MTree structure on the Data Domain system.
MTree structure and flexibility provides greater control over its data being replicated. Careful
planning of your data layout will allow the greatest flexibility when managing data under an MTree
structure.
MTree replication works only at the MTree level. If you want to implement MTree replication, you
must move data from the existing directory structure within the /backup MTree to a new or
existing MTree, and create a replication pair using that MTree.
For example, suppose that a Data Domain system has shares mounted in locations under /backup/ as
shown in the directory-based layout.
If you want to use MTree replication for your production (prod) data, but are not interested in
replicating any of the development (dev) data, the data layout can be modified to create two MTrees:
/prod and /dev, with two directories within each of them. The old shares would then be deleted
and new shares created for each of the four new subdirectories under the two new MTrees. This
would look like the structure shown in the MTree-based layout.
The Data Domain system now has two new MTrees, and four shares as earlier. You can set up MTree
replication for the /prod/ MTree to replicate all of your production data and not set up replication for
the /dev MTree as you are not interested in replicating your development data.
13
If the source Data Domain system has a high volume of data prior to configuring replication, the initial
replication seeding can take some time over a slow link. To expedite the initial seeding, you can bring
the destination system to the same location as the source system to use a high-speed, low-latency
link.
After data is initially seeded using the high-speed network, you then move the system back to its
intended location.
After data is initially seeded, only new data is sent from that point onwards.
All replication topologies are supported for this process, which is typically performed using collection
replication.
14
This lesson shows how to configure replication using DD Enterprise Manager, including lowbandwidth optimization (LBO), encryption over wire, using a non-default connection port, and setting
replication throttle.
15
16
Low bandwidth optimization (LBO) is an optional mode that enables remote sites with limited
bandwidth to replicate and protect more of their data over existing networks. LBO:
LBO can be applied on a per-context basis to all file replication jobs on a system.
Additional tuning might be required to improve LBO functionality on your system. Use bandwidth and
network-delay settings together to calculate the proper TCP buffer size, and set replication
bandwidth for replication for greater compatibility with LBO.
LBO can be monitored and managed through the Data Domain Enterprise Manager Data
Management > DD Boost > Active File Replications view.
17
Delta compression is a global compression algorithm that is applied after identity filtering. The
algorithm looks for previous similar segments using a sketch-like technique that sends only the
difference between previous and new segments. In this example, segment S1 is similar to S16. The
destination can ask the source if it also has S1. If it does, then it needs to transfer only the delta (or
difference) between S1 and S16. If the destination doesnt have S1, it can send the full segment data
for S16 and the full missing segment data for S1.
Delta comparison reduces the amount of data to be replicated over low-bandwidth WANs by
eliminating the transfer of redundant data found with replicated, deduplicated data. This feature is
typically beneficial to remote sites with lower-performance Data Domain models.
Replication without deduplication can be expensive, requiring either physical transport of tapes or
high capacity WAN links. This often restricts it to being feasible for only a small percentage of data that
is identified as critical and high value.
Reductions through deduplication make it possible to replicate everything across a small WAN link.
Only new, unique segments need to be sent. This reduces WAN traffic down to a small percentage of
what is needed for replication without deduplication. These large factor reductions make it possible to
replicate over a less-expensive, slower WAN link or to replicate more than just the most critical data.
As a result, the lag is as small as possible.
LBO is enabled on a per-context basis. LBO must be enabled on both the source and destination Data
Domain systems. If the source and destination have incompatible LBO settings, LBO will be inactive
for that context. This feature is configurable in the Create Replication Pair settings in the Advanced
Tab.
To enable LBO, click the checkbox, Use Low Bandwidth Optimization.
Key points of LBO:
19
Encryption over wire or live encryption is supported as an advanced feature to provide further
security during replication. This feature is configurable in the Create Replication Pair settings in the
Advanced tab.
To enable encrypted file replication, click the checkbox, Enable Encryption Over Wire.
It is important to note, when configuring encrypted file replication, that it must be enabled on both
the source and destination Data Domain systems. Encrypted replication uses the ADH-AES256-SHA
cipher suite and can be monitored through the Data Domain Enterprise Manager.
Related CLI command:
# replication modify
Modifies the destination hostname and sets the state of encryption.
Note: This command must be entered on both Data Domain systemsthe source and
destination (target) systems. Only an administrator can set this option.
20
The source system transmits data to a destination system listen port. As a source system can have
replication configured for many destination systems (each of which can have a different listen port),
each context on the source can configure the connection port to the corresponding listen port of the
destination.
1.
2.
3.
4.
5.
21
22
23
24
Data Domain Enterprise Manager allows you to generate reports to track space usage on a Data
Domain system for a period of up to two years back. In addition, you can generate reports to help
understand replication progress. You can view reports on file systems daily and cumulatively, over a
period of time.
Access the Reports view by selecting the Reports stack in the left-hand column of the Data Domain
Enterprise Manager beneath the listed Data Domain systems.
25
The Reports view is divided into two sections. The upper section allows you to create various space
usage and replication reports. The lower section allows you to view and manage saved reports.
The reports display historical data, not real-time data. After the report is generated, the charts
remain static and do not update.
The replication status reports includes the status of the current replication job running on the system.
This report is used to provide a snapshot of what is happening for all replication contexts, to help you
understand the overall replication status on a Data Domain System.
The replication summary reports includes network-in and network-out usage for all replication, in
addition to per-context levels on the system during the specified duration. This report is used to
analyze network utilization during the replication process to help understand the overall replication
performance on a Data Domain system.
26
The replication status report generates a summary of all replication contexts on a given Data Domain system
with the following information:
ID: the context number or designation or a particular context. The context number is used for
identification; 0 is reserved for collection replication, and directory replication numbering begins at 1.
Source > Destination: The path between both Data Domain systems in the context.
Type: The type of replication context, will be Directory, MTree, or Collection .
Status: Error or Normal.
Sync as of Time: Time and date stamp of the most recent sync.
Estimated Completion: The estimated time at which the current replication operation should be
complete.
Pre-Comp Remaining: The amount of storage remaining pre-compression (applies only to collection
contexts)
Post-Comp Remaining: The amount of storage remaining post-compression (applies only to directory,
MTree, and collection contexts).
If an error exists in a reported context, a section called Replication Context Error Status is added to the
report. It includes the ID, source/destination, the type, the status, and a description of the error.
The last section of the report is the Replication Destination Space Availability, showing the destination system
name and the total amount of storage available in GiB.
Related CLI command:
# replication show performance
Displays current replication activity.
27
28
Onsite Data Domain systems are typically used to store backup data onsite for short periods such as
30, 60, or 90 days, depending on local practices and capacity. Lost or corrupted files are recovered
easily from the onsite Data Domain system since it is disk-based, and files are easy to locate and read
at any time.
In the case of a disaster destroying onsite data, the offsite replica is used to restore operations. Data
on the replica is immediately available for use by systems in the disaster recovery facility. When a
Data Domain system at the main site is repaired or replaced, the data can be recovered using a few
simple recovery configuration and initiation commands.
If something occurs that makes the source replication data inaccessible, the data can be recovered
from the offsite replica. Either collection or directory replicated data can be recovered to the source.
During collection replication, the destination context must be fully initialized for the recover process
to be successful. Recover a selected data set if it becomes necessary to recover one or more directory
replication pairs.
Note: If a recovery fails or must be terminated, the replication recovery can be aborted.
29
1.
2.
3.
4.
5.
6.
Note: A replication recover cannot be performed on a source context whose path is the source path
for other contexts; the other contexts first need to be broken and then resynchronized after the
recovery is complete.
If a recovery fails or must be terminated, the replication recover can be aborted.
Recovery on the source should be restarted again as soon as possible by restarting the recovery.
1. Click the More menu and select Abort Recover. The Abort Recover dialog box appears,
showing the contexts that are currently performing recovery.
2. Click the checkbox of one or more contexts to abort from the list.
3. Click OK.
30
Resynchronization is the process of recovering (or bringing back into sync) the data between a source
and destination replication pair after a manual break in replication. The replication pair are
resynchronized so both endpoints contain the same data.
Resynchronization can be used:
To convert a collection replication to directory replication. This is useful when the system is
to be a source directory for cascaded replication. A conversion is started with a replication
resynchronization that filters all data from the source Data Domain system to the destination
Data Domain system. This implies that seeding can be accomplished by first performing a
collection replication, then breaking collection replication, then performing a directory
replication resynchronization.
31
1. Break existing replication by selecting the source Data Domain system, and choosing
Replication. Select the context to break, and select Delete Pair and click OK.
2. From either the source or the destination replication system, click the More menu and
select Start Resync. The Start Resync dialog box appears.
3.
4.
5.
6.
7.
Select the source system hostname from the Source System menu.
Select the destination system hostname from the Destination System menu.
Enter the directory path in the Source text box.
Enter the directory path in the Destination text box.
Click OK.
This process will add the context back to both the source and destination DDRs and start the resync
process. The resync process can take between several hours and several days, depending on the size
of the system and current load factors.
32
33