Professional Documents
Culture Documents
Data Domain, Deduplication and More
Data Domain, Deduplication and More
DEDUPLICATION
Total Cost of Ownership
With any backup solution, the total cost of ownership of the solution needs to encompass all
elements to enable the solution to function. These elements include the, but are not limited to;
Snapshots provide a fast point in time copy of the data, however, it is recommended to roll over
selected snapshot based copies to an external storage appliance, like Data Domain. The risk of
relying solely on snapshot copies for recovery is directly linked to the integrity of the primary
snapshot. If the primary snapshot becomes corrupted, then all subsequent snapshots are likely to
be unavailable for data recovery operations.
When recovery of data is required, the department must be 100% confident that the backup
solution will recover the data. This is why EMC have place great importance on the Data
Invulnerability Architecture (DIA) with Data Domain.
EMC Confidential 1
Data Domain, Deduplication and More
EMC Confidential 2
Data Domain, Deduplication and More
supports integration with Data Domain Boost. The effect of distributed deduplication is a
reduction in backup data being transferred over the IP network infrastructure compared to
traditional CIFS or NFS protocols.
Easy Integration – due to the extensive compatibility with backup software and archive
applications, Data Domain systems integrate easily into existing backup environments.
Disk based backup systems offer similar performance characteristics with a significant
reliability advantage over traditional tape based backups. Physical Tape libraries suffer
from a single point of failure at the robotic arm and a significant amount of manual effort is
required to manage the tape operations.
Replication – is supported between sites in a peer relationship, a cascaded relationship,
a one to many relationship or a many to one relationship that would be found when smaller
regional data centres replicate back to a larger central site. A large Data Domain appliance
can support a replication fan-in from up to 270 remote sites. Cross-site deduplication
minimises the required bandwidth between all sites, since only the first instance of data is
transferred across any of the WAN segments. The volume of data transferred is reduced
by up to 99 percent, making replication very efficient.
Scalability – the Data Domain appliances provide fast inline deduplication with up to
31TB/hour of throughput when using Data Domain Boost, with the largest appliance
providing up to 2PB of usable capacity with Data Domain Extended Retention. This allows
a single Data Domain appliance to store up to 100PB of logical data for long term backup
storage.
Figure 1 provides an overview of the EMC Data Domain 5.5 family.
EMC Confidential 3
Data Domain, Deduplication and More
Typically, when an application owners wish to control their own backup and recovery process, IT
departments end up creating silos of backup repository storage. To eliminate this, EMC has worked
with other vendors to help improve the backup speed and reliability of business critical applications
by leveraging the native backup interface, and providing the application owner with full control.
Currently with Data Domain Operating System version 5.5, the applications shown in Figure 3 are
supported to use Data Domain. This eliminates silos of storage as the enterprise backup solution
and supported applications store their backup data in a single globally de-duplicated Data Domain
appliance.
EMC Confidential 4
Data Domain, Deduplication and More
Not only is the Data Domain built for storing backup data, it has also been designed to support a
large ecosystem of archiving applications. With the release of Data Domain Operating System
version 5.5, Data Domain support up to 1 Billion small archive files. The current list of supported
archive applications is shown in Figure 4.
With the range of backup, enterprise and archive applications, Data Domain is designed to integrate
easily into an environment and used by variety of applications.
EMC Confidential 5
Data Domain, Deduplication and More
A summary of the required storage needed for range of de-duplicated ratios is provided in Table 2.
When comparing deduplication ratios, a few percentage points of commonality difference may not
appear to be of any great significance, but the difference in the required backend storage is not
insignificant.
EMC Confidential 6