Professional Documents
Culture Documents
Huawei OceanStor Backup Software Technical White Paper For Simpana 1
Huawei OceanStor Backup Software Technical White Paper For Simpana 1
Huawei OceanStor Backup Software Technical White Paper For Simpana 1
Trademark Notice
General Disclaimer
The information in this document may contain predictive statements including,
without limitation, statements regarding the future financial and operating
results, future product portfolio, new technology, etc. There are a number of
factors that could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements. Therefore, such
information is provided for reference purpose only and constitutes neither an
offer nor an acceptance. Huawei may change the information at any time
without notice.
Contents
Copyright © Huawei Technologies Co., Ltd. 2014. All rights reserved. ................. 2
Trademark Notice ........................................................................................................ 2
General Disclaimer ...................................................................................................... 2
Contents ....................................................................................................................... 3
Figures .......................................................................................................................... 6
Tables ........................................................................................................................... 8
1 Software Architecture ............................................................................................ 9
1.1 CommServe Server ................................................................................................................ 9
1.2 MediaAgent Server ................................................................................................................ 9
1.3 iDA....................................................................................................................................... 10
1.4 CTE ...................................................................................................................................... 10
1.5 Distributed Indexing Mechanism — Soul of Data Management Software .......................... 10
2 iDA ........................................................................................................................... 12
2.1 File System iDA ................................................................................................................... 12
2.1.1 Backup and Recovery Module of the File System ...................................................... 12
2.1.2 Data archiver of the File System ................................................................................. 12
2.2 Oracle iDA ........................................................................................................................... 13
2.2.1 Backing up an Oracle Database .................................................................................. 13
2.2.2 Recovering the Oracle Database ................................................................................. 17
2.3 Microsoft SQL Server iDA .................................................................................................. 21
2.3.1 SQL Server Transaction Log and Automatic Recovery .............................................. 21
2.3.2 Backup of the SQL Server Database ........................................................................... 22
2.3.3 Recovery of the SQL Server Database ........................................................................ 22
2.3.4 Characteristics of the SQL Server iDA ....................................................................... 22
4 SnapProtect............................................................................................................. 41
5 Continuous Data Replicator ............................................................................... 43
5.1 Recovery Management Layer .............................................................................................. 43
5.2 Role of the CDR on the Recovery Layer ............................................................................. 44
5.3 CDR Process ........................................................................................................................ 44
5.4 Copy Modes ......................................................................................................................... 44
Figures
Figure 6-6 GridStor — sharing storage devices and data directory .......................................... 56
Tables
1 Software Architecture
1.3 iDA
iDA is used for backing up and restoring file systems. It provides the
resumable transmission function. When a network or MediaAgent server
becomes faulty, another MediaAgent server will take over the services at the
breakpoint and send the data to the backup media.
If iDA and MediaAgent are installed on one server, LAN-free backup is
supported.
1.4 CTE
CommVault supports integrated data management. Its core is the underlying
software Common Technology Engine (CTE). As the backbone in the
Simpana architecture, the CTE enables independent products to interact and
communicate with each other. This allows various independent applications to
automatically interact and communicate with each other. These applications
include the following modules: backup and recovery, data migration, legal
archiving, quick recovery, storage resource management, and SAN
management.
Unlike traditional software, each CommVault module is an independent
solution. Each CommVault module must use the CTE. These modules share
the CTE. An integrated console is provided, on which centralized policies can
be created for accomplishing data management operations through each
module. The integrated automatic process is very efficient. The administrator
can quickly detect problems and discover solutions to the problems. For
example, the data migration module is requested to transfer data to a
secondary storage medium as soon as the data volume in the primary storage
medium is detected to have reached the alarming level. The data migration
continues until the data volume in the primary storage medium falls below the
alarming level.
2 iDA
Figure 2-1
Figure 2-2
Figure 2-3
After the database NoCatalog is backed up, run the control file backup script. Then,
the database can be recovered even if the control file is damaged. If the control file is
backed up in script mode on the local computer, it is necessary to back up it in file
mode to a storage device. Therefore, a file backup set needs to be added.
After the backup instance is created, start the backup manually or
perform the backup as planned.
Figure 2-4
Figure 2-5
Figure 2-6
Figure 2-7
Figure 2-8
Figure 2-9
Figure 2-10
Figure 2-11
Figure 2-12
Figure 2-13
Figure 2-14
Figure 2-15
Figure 2-16
Recover the database file. Before the recovery, switch the database to
MOUNT state.
Figure 2-17
Figure 2-18
Figure 2-19
Figure 2-20
Figure 2-21
Figure 2-22
High performance
The MSDN API is used for performing online backup and recovery. This
allows databases to be protected at any time.
The VDI provides faster data transmission during backup and recovery.
It is seamlessly integrated with the slice function of the SQL. It can write
multiple data flows into multiple tape drives, speeding up the backup.
Multiple database backup and recovery processes can be performed
simultaneously. This reduces the requirements for system resources and
backup windows.
LAN-free backup by using the built-in sharing storage is supported.
Users can select databases and perform inter-computer recovery and
point-in-time recovery.
High reliability
The automatic hot disaster recovery policy can reduce the breakdown
time due to data damage and hardware fault.
The resumable backup function can reduce the backup time and ensure
the completion of data protection operations.
Various recovery modes (such as point-in-time recovery and gradual
database recovery) are supported to quickly locate the scope of data loss.
New SQL databases can be automatically discovered during backup. This
ensures that new data can be protected.
Ease of use
An integrated GUI and wizard ensure quick deployment and convenient
management.
Access control and security policies can be configured. Node table (NT),
Exchange, and SQL user groups can be created. Operation rights can be
authorized. Operation rights of unauthorized user groups can be
automatically restricted.
Unified event monitoring and observing is provided. Events can be
filtered.
Figure 2-23
Table 2-1
Figure 2-24
Figure 2-25
Figure 2-26
along with the frequency of access. CommVault adopts the storage tiering
mechanism. To ensure data availability, key email data is stored by using
expensive storage technologies and modes, such as RAID disk array, copy,
scheduled copy, and hierarchical backup. When the email data cannot bring
benefits for the enterprise any more, the data will be moved to a less
expensive storage medium. When the email data is not accessed any more, it
will be deleted or moved. If the email data is required by law or governmental
regulations to be stored for many years, it will be moved to a nearline or
offline storage medium to be archived, which is safe and cost-effective.
The tiering storage mechanism can meet the requirement of customers for
minimized storage cost. The specific advantages are as follows:
Minimized total storage cost: The data that is seldom accessed is stored
in a less expensive storage medium. In this way, the performance
advantage of disks and the cost advantage of tapes are combined.
Optimal performance: Tiering storage of email data maximizes the
advantages of different types of storage devices.
Improved data availability: In tiering storage mode, the historical data
that is seldom used is moved to an auxiliary storage device or archived to
an offline storage pool. In this way, the data does not need to be saved
repeatedly, and the save time is reduced. This improves the availability of
the online data. The available space in disks can be maintained above the
level required by the system.
Transparent data migration to applications: After the tiering storage
mechanism is adopted, the header of an email is stored on the source
storage device when the email is moved to another storage device. This
allows users to access the email without changing the access mode and
data migration to be transparent to applications.
Emails can be archived by preset rules, for example, by the quota of each
mailbox.
Figure 2-27
Figure 2-28
After emails are moved, users can view the following information in the
personal Outlook window.
Figure 2-29
The icon of each email has been changed into the CommVault icon and
marked with ARCHIVED, indicating that the body and attachments of the
email have been moved to a secondary storage medium. To access this data,
users can double-click the email. CommVault will automatically recover the
email.
In the Outlook window, an individual user can delete the header information
of the archived emails that are seldom used. When the user finds that such an
email is still useful, the email cannot be recovered since the header
information of the email has been deleted. In this case, the user can use the
ADD-IN module provided by CommVault for Outlook to search and recover
the data. The GUI for this function is the same as that for searching backup
data of mailboxes.
Figure 2-30
Figure 2-31
Figure 2-32
Figure 2-33
Archive contents
Figure 2-34
Figure 2-35
4 SnapProtect
Simpana is a practical tool. Users can browse their backup tasks, including the
tasks of virtual machines, physical servers, volumes, files, objects, and entire
applications, directly on the Simpana administration console.
Maximized ROI of storage systems and networks
The SnapProtect technology provides 7 x 24 protection for the tier-1, tier-2,
and tier-3 key tasks and service data with the investment in the high-speed
Ethernet and optical network and the high performance of storage disk arrays.
Free of constraints due to solutions
Many enterprises use the disk array-based snapshot technology to create
point-in-time data copies to protect online data access. The snapshot
technology has certain advantages. To make use of these advantages, however,
users are facing new challenges. In this case, users must ensure that data
copies can be managed on different storage and tape layers to meet storage,
cost, and recovery requirements.
Solution-related constraints usually involve the creation and maintenance of
complicated scripts. Multiple products and technologies are independently
managed. These technologies and products include various application-based
snap-type products, data migration technologies, and archiving and backup
products. They can ensure that the data can be moved between different
storage layers and positions and leave tapes at last. If the storage systems and
tapes are not uniformly cataloged or managed, it is a heavy and
time-consuming task to manually recover the data. Usually, a system
administrator with various professional skills has to restore the data from
various storage layers to the product system and manually ensure the
consistency and timeliness of the application data.
Bottom line: business operation advantages and low cost
By reducing the storage and management time, the SnapProtect technology
can reduce the business operation cost and investment related to the
management and protection of key application data. This can save input and
manpower, which can be used in other projects. The data is reliably protected
and can be recovered easily when it is required.
Enterprises and governments may have offices and data centers in different
regions. They are very concerned about how to use Simpana to protect
large-scale distributed data. Traditionally, local backup devices are used.
Local backup is very expensive, and disaster recovery backup is not supported.
Due to the deficiency of operators in skills, remote data protection and
recovery becomes a knotty problem. CommVault offers a solution to this issue.
Simpana has many options that ensure cost-effective high-quality remote data
protection and availability and provides a unified enterprise-level data
management environment for decentralized enterprises.
As a CommVault module, the Continuous Data Replicator (CDR) can protect
data in cooperation with other modules. Both CDR and SnapProtect are on the
recovery management layer of the data center. The CDR adopts a special
method to meet the recovery time objective (RTO) and recovery point
objective (RPO). Changes are continuously copied. In most cases, the
destination recovery time is only seconds later than the time of the original
data. In combination with the recovery point during copying, the recovery
time can be minimized. Data can be recovered from a volume backup or from
a valid snapshot at an appropriate time.
The CDR and the entire data management module of CommVault use the
same GUI and policies to protect copy data during the entire life-cycle of the
data.
One-to-one copy
Figure 5-1
Multiple-to-one copy
Figure 5-2
One-to-multiple copy
Figure 5-3
Figure 5-4
After the data of a remote office is copied and the corresponding recovery
point is created, the data of the remote office becomes a part of the data
protection policy. Recovery points are created by COW. A COW is a
space-saving snapshot volume that contains the copied data. This type of
snapshot can recover data in various modes.
Snapshots can be loaded as read-only volumes for users to browse, read,
and copy the file data.
Administrators can process any recovery point on the COW and change
it into a complete volume. This volume has a specified time for any
server in the network environment.
the log and makes changes on the destination computer according to the log.
After the network connection is recovered, the copied data is sent back to the
system. Both normal copying and running of applications are not interrupted.
Large-scale network interruption: A large-scale network interruption refers to
an interruption during which the copy data obtained from the source computer
exceeds the available log space. In this case, the copied change data during
network interruption will be lost, and the copied data on the destination
computer is incomplete. Users can re-create a copy on the destination
computer and send all data on the source computer to the destination computer.
This method, however, is inapplicable to large-scale databases. Some
solutions initialize copy by comparing files one by one between the source
and destination computers. With this method, users do not need to copy all
files. In the case of a large complex copy set, it consumes too many resources
and too long time.
The CDR provides an intelligent synchronization mechanism for
automatically recreating the complete copy data after the network connection
is recovered. In this way, users do not need to copy all files or endure the
process of comparing each file. The CDR adopts a new mechanism. It invokes
the change log file of the Windows file system and quickly locates the files
that are changed during network interruption instead of comparing them one
by one. It is very important for an enterprise-level database.
Boundless synchronization: Compared with daily changes, a data set is too big.
The network bandwidth may meet the requirement for daily change copying,
but it takes several days to create the initial copy. In this case, users can use
the boundless synchronization mode to quickly copy the initial data to the
destination computer.
The CDR provides a tool for processing large-scale data sets. This tool is
integrated with the copy process. After the initial copy data is obtained,
administrators can upload the initial data to the destination computer. A
backup set or disk cloning can be adopted. During data transmission, the CDR
obtains and records data changes on the source computer. After the
initialization is complete, the CDR transmits the changed data. The normal
copy process starts.
When a remote disaster occurs, this is also an effective method for recovering
the remote system. When the primary system becomes faulty, the secondary
system becomes the production system. The administrator can use the
boundless synchronization mode to restore the data quickly to the primary
system from the secondary system and copy the changed data to the primary
system. In this way, the primary system can be quickly recovered.
5.10 Conclusions
The CDR extends the functions of Simpana modules on the recovery layer. It
is integrated with the data protection and capacity management functions
through a single unified GUI. The settings, operations, and recovery points of
data copying can be managed and monitored. The CDR provides various
enterprise-level functions, such as data compression, data encryption,
advanced connection recovery, and boundless synchronization. It is the first
choice solution for remote office data management and disaster recovery. The
advantages of the CDR solution are as follows:
Impact on the performance of the source system: Owing to its special
technical advantages, it uses only a few resources of the source system.
In the production system, the impact of the CDR on the source system
depends on the data volume of the source system to be copied. In a
system whose copy data volume is normal, the CPU usage of the CDR is
less 8%, and the memory usage of the CDR is only dozens of MBs.
Low network bandwidth: The CDR copies data by byte. It can ensure
timely copy in the case of limited bandwidth. It provides the data
compression function to further reduce the requirement for bandwidth.
Powerful error tolerance: It provides powerful error tolerance during
network failures. When the copying process is interrupted due to a
disaster or transmission exception, the services of the primary system are
not affected. After the network connection is recovered, the cached data
can be automatically copied to the disaster recovery center.
Convenient management: It provides a perfect GUI. All operations can
be performed and managed in one window. It provides rich reports,
which can be used to manage all storage resources and clients. It is
completely localized, which facilitate the maintenance and management
by users.
Figure 6-1
Figure 6-2
Figure 6-4
Once a storage policy is defined, data can be assigned with a storage policy in
one-click mode. This changes data management methods and the change is
complete without reconfiguring hardware, connection, and networks.
Back-end operations for assigning and defining storage policies are complete
separately within Simpana. This reduces maintenance expenses and
complexity for IT personnel. For example, using Simpana storage policies can
simplify complicated activities, such as configuration, maintenance,
management, and reporting, and greatly reduce expenses for managing data
storage environments.
6.6 Restart/Checkpoint
All data transfer operations of Simpana, such as data backups, recovery,
auxiliary copy, and synthetic full backups, have checkpoints to ensure that
operations can be resumed after interruption. This function is important for
backup and recovery on the WAN and improves the backup and recovery
success rate. Compared with Simpana, competing products provide this
feature only for some functions and therefore have lower success rate.
Figure 6-5
6.7 GridStor
When the SAN is used, dynamic drives can be shared by multiple
MediaAgents, which improves resource usage and reduces redundant
resources. However, the SAN structure cannot eliminate errors caused by
hardware (both network and storage devices), resulting in data losses.
The QiNetix GridStor technology enables Simpana to provide error failover,
load balancing, storage pool, and prevention capability, which simplifies
management and improves data accesses. In addition, both near-line storage
and off-line storage can be used, reducing system TCO. Different from other
solutions in the market, GridStor can be used across different operating
systems and store types. For example, a backup activity in a Windows
operating system can be switched from Windows MediaAgent to MediaAgent
in a Solaris operating system. If data recovery is required, the user does not
need to know where the data is saved and the system can automatically find it.
This means that the changed directory is always effective and allows
accessing the specified store even when the data written by the MediaAgent is
ineffective. Both original online storage resources and near-line or offline
storage resources may be used for backups. Priorities of storage resources are
set by users. Once priorities are set, storage resources are used automatically
and transparently, which optimizes resource utilization. Importantly, the user
can perform resource management in advance to balance load, perform
redirect activities to access unused storage resources, and correct errors in
backup and recovery to improve data access. In a word, GridStor is an
advantage and feature of Simpana.
6.12 Deduplication
Exponential data increases bring multiple challenges for IT departments.
The challenges include quickly backing up and recovering increasingly