Professional Documents
Culture Documents
Disaster Recovery Solution Using Virtualization Technology
Disaster Recovery Solution Using Virtualization Technology
4
vendor) and Zacatii Consulting (a com- logy, NetApp’s storage technology and in a hypothetical environment, the fol-
pany with broad experience in DR Intel’s server platform. lowing test environment was designed
solutions) as new participants. Phase II for Phase II (See Figure 1), based on
of the project builds on the success of Key elements Zacatii Consulting’s experience and
Phase I, by separating the operation of the test environment taking actual business purposes into
and DR sites. During Phase I the DR consideration;
solutions for SAP ERP were located Continuing from Phase I, which verified
hypothetically in the same laboratory, that transactions at the operation site • At the operation site, two IA (Intel
using VMware’s virtualization techno- were correctly reflected in the DR site Architecture based) servers running
FC 4G FC 4G
10GE 10GE
FCoE FCoE
FC 4G FC 4G
5
VMware ESX are installed, to comply • Packets were compressed using the Replication
with the assumption that more than WAN acceleration solution WAAS Data was replicated using NetApp’s
one system is running. At the DR (Wide Area Application Services), by storage management software
site, virtual machines are integrated Cisco Systems’ WAE-7371, to “NetApp SnapManager® for Microsoft®
into one IA server to minimize cost improve data transfer performance. SQL Server™” and data replication
and achieve an environment assum- software “SnapMirror®” every 60 min-
ing a ‘many-to-one’ server • The operation site was located in utes, to synchronize with the DR site
configuration. Tokyo and the DR site in Osaka, while processing 5,000 to 10,000
500km away. The sites were con- transactions/hour with two virtual
• At the operation site, an Intel® Xeon® nected by a VPN, which provides machines for ERP (host servers using
5500 series processor-based server, cost benefits. Intel® Xeon® 5500 series processors
designed for improved virtualization and Intel® Xeon® 5300 series proces-
performance, and a server based on Verification scenario sors) at the operation site.
older generation Intel® Xeon® 5300
series processors, were used. At the During Phase II of the project, verifica- Business restart after a disaster
DR site, an Intel® Xeon® 5500 series tion of the technologies supplied by (Failover)
processor-based server was installed each partner and confirmation of the VMware’s VMware vCenter Site
to verify that the data could be syn- results of the entire project were con- Recovery Manager (SRM), which man-
chronized and that business could be ducted based on the design and verifi- ages and automates DR, is activated if
continued in the environment with dif- cation scenario of Zacatii Consulting. the system at the operation site is
ferent mixed platforms. stopped during the transaction. The
RPO1 and RTO 2 tests confirmed normal operations of
• Synchronization was carried out RPO (Recovery Point Objective) was SAP ERP on the virtual machine at the
simultaneously at the operation and verified for 2 to 3 hours, and RTO (Re- DR site, and the correct reflection of
DR sites, using NetApp’s storage covery Time Objective) was verified for transaction data of SAP ERP until the
technology. It was confirmed that the 2 to 4 hours. RPO and RTO are both end of the synchronization.
aligned data was transferred without important indicators of the degree of
interrupting the system. business continuity. Targeted compa-
nies are middle-sized or larger compa-
• A simple network environment was nies, including retail businesses and EC
established by integrating FC (Fibre businesses of small-sized companies
Channel) and Ethernet networks that require a shorter RPO and RTO.
between the server and the storage
system using the latest “FCoE (Fibre
Channel over Ethernet)” network
protocol.
1. RPO: An indicator that represents the timing or frequency of capturing backup data. Used in backing up and restoring operation verifications during the
disaster recovery process.
2. RTO: An indicator that represents the time required for recovery of an information system, from the moment a disruption is caused by a system failure or
disaster, until restoration to an agreed-upon service level. Used in BCP or DR operation verification.
6
Establishing the Server Environment
Latest server platform supports
a strong virtual environment
Server + VM (Virtual Machine) Operation site The other server has an older genera-
environment Three physical servers were installed at tion Intel® Xeon® 5300 series processor
the operation site. Two of them have to permit continuous use from the
The server configurations at the opera- VMware ESX installed, and the other existing environment. On that server,
tion site (in Tokyo) and the DR site (in has VMware vCenter Server (vCenter the virtual machine for ERP (ERP#2)
Osaka) are as described below (See Server) and VMware vCenter Site Re- was installed.
Figure 2). It was assumed that the two covery Manager (SRM) installed to
physical servers at the operation site manage the virtual environment in an DR site
and one physical server at the DR site integrated manner. Two physical servers were installed at
were all running before the new envi- the DR site. One of them is for the VC
ronment was established. The scenario The virtual machines for ERP (ERP#1) + SRM server, the other is a high-spec
called for the addition of a server with and for the other applications (AD#1) server with an Intel® Xeon® 5500 series
Intel’s latest processor platform at both were installed on a high-spec server processor and VMware ESX. The serv-
the operation site and DR site, thereby with the latest Intel® Xeon® 5500 series ers integrate the three virtual machines
establishing a new DR site using the processor. (ERP#1, ERP#2 and AD#1) managed
virtualization technology. It was verified by VMware ESX at the operation site
that a flexible DR site could be estab- into one host machine at the DR site.
lished with different server
environments.
VC + SRM VC + SRM
Windows Windows
AD AD
#1
AD replication #2
VM
Windows Windows
7
Synchronization method Intel® Xeon® 5500 series processor design technology of the processor,
AD#1 in a virtual machine automatically and features next-generation micro-
transfers the changes to others by Ac- The processor for the server released architecture code-named “Nehalem.”
tive Directory Replication Technologies. by Intel in March 2009, the Intel® Xeon® The Intel® Xeon™ 5500 series processor
The virtual machines for ERP (ERP#1 5500 series processor, was used at has four cores and supports the dual
and ERP#2) synchronize data using the operation and DR sites for the socket platform. The former FSB (front-
NetApp’s storage function (SnapMirror demonstration tests. It was verified that side bus) was replaced with the faster
and SnapManager for Microsoft SQL they achieved more effective disaster QPI (QuickPath Interconnect) to con-
Server). recovery in the VMware virtual nect the processor with the chipsets. It
environment. accesses memory directly and more
quickly by using the memory controller
The Intel® Xeon® 5500 series processor embedded in the processor.
is a completely revamped microarchi-
tectural version developed from the In addition, the processor has Intel®
former Intel® Core™ Micro Architecture. Intelligent Power Technology to reduce
The new version represents the core the power consumption of the core
25
20
15
10
0
Intel® Xeon® 5300 Intel® Xeon® 5500 Intel® Xeon® 5400 Intel® Xeon® 5500
series processor series processor series processor series processor
• Dell PowerEdge 2950 III • Dell PowerEdge R710 (Operating frequency: (Operating frequency:
• VMware ESX v3.5.0 • VMware ESX v4.0 3.16 GHz) 2.93 GHz)
VMmark v1.0 VMmark v1.1
• 8.47@6 tiles • 23.55@16 tiles
• 2 sockets • 2 sockets
• 8 cores • 8 cores
• 8 threads • 16 threads
Figure 3: Results of VMmark benchmark Figure 4: Virtualization performance test for SAP ERP
For details on the test environment, please visit www.vmware.com/ For details, please visit http://download.intel.com/business/software/
products/vmmark/results.html. testimonials/downloads/xeon5500/sap_virtualization.pdf
8
when idling to almost zero, using auto- The Sun Fire X4270 server (2 proces-
matic controls and Intel® Turbo Boost sors/8 cores/16 threads), with the
Technology to raise operating frequen- Intel® Xeon® 5500 series processor,
cy over the rated power for a heavier scores 3700 for SD benchmark by
workload. This ensures further energy SAP ERP 6.0, as of March 30, 2009.
savings and improved performance. For more information on the SAP SD
The various improvements of virtualiza- benchmark, please visit www.sap.com/
tion technology have brought to Intel® solutions/benchmark/index.epx).
Xeon® Processor 5500 series. In virtual performance tests for SAP
ERP conducted by Intel, it was verified
Dramatic evolution improves that the Intel® Xeon® 5400 series pro-
virtual environment performance cessor (3.16 GHz) and Intel® Xeon®
5500 series processor (2.93 GHz)
The Intel® Xeon® 5500 series processor achieved a performance 1.7 times
is evolving dramatically in virtualization. higher (See Figure 4). Given these
Intel® Virtualization Technology sup- facts, SAP ERP has also been proved
ports virtualization not only for the pro- to achieve full performance in a virtual
cessor but also for the entire platform environment .
including chipsets and network interfac-
es, EPT (Extended Page Table). Intel®
Virtualization Technology improves the
entire performance and I/O and net-
work performance as well.
9
Establishing the Storage Environment
Transferring the virtual ERP machine data
to the DR site every 60 minutes
Replication using NetApp data to the DR site. At the DR site, SQL servers are stopped to replicate
SnapManager for Microsoft SQL the differential backup system of the data for the DR site only when any
Server storage saves the backup data for the changes are made to the system.
previous seven sessions. The method
The LUN (Logical Unit Number) in the and scheduling for synchronizing with UserData area
storage has the structure described the DR site depends on the feature of The logs and databases in the
below. When NetApp’s storage the LUN as described below (See Fig- UserData area are replicated to the DR
management software (NetApp Snap ure 5). site online using SnapManager for
Manager for Microsoft SQL Server) is Microsoft SQL Server. A command is
implemented, the local backup function OS area and system area performed every 60 minutes to transfer
of the storage, Snapshot™, creates a The data in the OS area and system the backup data to the DR site.
backup image of the virtual machine area are not changed except when the
data using very little data, and the repli- system is updated or a security patch
cation tool SnapMirror transfers the is applied. Therefore, SAP ERP and
SnapMirror SnapMirror
10
Setting the transfer interval,
taking verification
into consideration management Database Suite
software Database collaborative management software
General
11
Advanced architecture of NetApp In this test, backup images for the pre-
vious seven sessions were saved, but
Snapshot the data used was not seven times the
Snapshot, one of the basic Data original – the images were saved using
ONTAP functions, acquires online much less data. In addition, SnapMirror
backup images during system opera- can make rapid transfers by LAN or
tion. The most advantageous features WAN, since it uses less network traffic
include its excellent space efficiency, by transferring only the differential
no deteriorated writing performance, data. For the data transfer channel,
and flexible operation per volume unit, users can use IP networks and also FC
which make it possible for the user to networks.
obtain backups very often, and to sig-
nificantly reduce recovery time. FlexClone
Snapshot obtains only metadata blocks FlexClone instantly creates a clone of
at a certain point in time, rather than the disc volume within the storage. It
making a physical copy of the entire applies Snapshot technology to make a
volume for backup. Therefore, it has backup only for differential data blocks.
none of the deteriorated performance It is able to instantly create a clone by
that is seen in the Snapshot that copying only logical data, without mak-
adopts the so-called copy-out method. ing a physical copy (thus, similar to
Implementation can be conducted per Snapshot’s approach). In fact, it can
volume and the obtained Snapshot im- create a clone using only 10% of the
age is saved separately in the hidden original amount of data when creating a
directory in each volume. Acquisition clone, which means that the same
and recovery of Snapshot does not amount as that of the original is not re-
affect the other volumes, and operation quired for creating clones. Data copied
rules for each volume can be deter- by FlexClone is readily available for
mined flexibly. reading and writing, while copied data
created by Snapshot was for read only.
SnapMirror This allows the user to conduct differ-
SnapMirror is a backup function apply- ent kinds of tests using clones.
ing Snapshot. It transfers only changed
data blocks to the destination after the
first 0 level transfer (whole data trans-
fer of the targeted volume). One spe-
cial feature of SnapMirror is that it
transfers data as if a full copy image for
each transfer is being saved, even
though only the differential data is
transferred after the first transfer.
12
Establishing the Network Environment
Optimal I/O integration for DR
for the SAP system, and optimized WAN
Cost reduction by FCoE protocol The reason for using FCoE in the dem- Connections to the server and
onstration test was to handle the I/O storage
One of the special features of the dem- traffic, which increases with virtualiza-
onstration tests was the participation of tion. When a server is virtualized, Each physical server is connected with
Cisco Systems, which made it possible multiple virtual machines run on one a Cisco Nexus 5020 data center class
to establish a network platform using physical server, so the traffic amount switch, using FCoE. The Cisco Nexus
the latest network technology, FCoE for each physical server is increased. 5020 is connected with NetApp’s stor-
(Fibre Channel over Ethernet). FCoE is As the number of virtual servers in- age using FC via a Cisco MDS 9222i
a protocol to connect interfaces with creases, LAN, SAN and server traffic FC switch. The connection between
FC and Ethernet connections, for which increases, leading to more complicated the switch and the storage was tested
standardization is promoted. It does not wiring and overloaded operations. The for multiplexing by virtualizing the SAN,
require separate infrastructures for demonstration tests therefore exam- using the VSAN (Virtual SAN) function
LAN and SAN, and integrates I/O. The ined ways to reduce cables, switches of the Cisco MDS 9222i while main-
10Gbps connection makes a broad- and interfaces, by establishing a net- taining its independency. A Cisco Cata-
band environment available. work environment using the FCoE pro- lyst 3750E was used for the Ethernet
tocol. This would lead to less network switch, and connected with a Cisco
management and lower costs, including Nexus 5020 using 10GB Ethernet.
less power consumption (and therefore Between the operation and DR sites,
greener IT operations). the cost-effective option of a VPN was
used.
Eth2/1
13
WAAS, the WAN acceleration duced to achieve effective use of band- Storage replication is also accelerated.
solution width, using technologies such as look- Before adopting WAAS, the through-
ahead, alternative protocol and puts to send SAP ERP data from the
In the demonstration tests, Cisco caching. operation site to the DR site were
Systems’ WAN acceleration solution 60.04 (KB/sec) and 61.82 (KB/sec) for
called WAAS was used to enhance the Using WAAS to achieve effective the data area, and 101.84 (KB/sec) for
efficiency of transferring data between use of bandwidth the log data. After adopting WAAS,
the sites. WAAS technology is used to they were 228.20 (KB/sec), 219.26
achieve the same performance as a According to the Network Traffic Sum- (KB/sec) and 248.45 (KB/sec) – 2 to
LAN, but on a WAN. The amount of mary Report for WAAS, performance 3.5 times faster (See Figure 9).
data to be sent to the WAN is reduced measured in the test environment re-
by combining DRE (Data Redundancy sulted in 4.92 times greater effective
Elimination), which optimizes bandwidth capacity in Osaka, and 5.64 times
on WANs based on data redundancy greater in Tokyo. The bandwidth usage
and data compression. In addition, rate was over 80% at both sites, prov-
WAN throughput is improved by TFO ing that bandwidth was used more
(Transport Flow Optimization), which effectively than before adopting WAAS
optimizes TCP traffic. Delays are re- (See Figure 8).
14
Key points for establishing a In addition, it was proven that even
network environment inexpensive WAN connections can
achieve satisfactory performance
The demonstration tests show that through the use of WAAS. In fact,
environments where switches and a very cost-effective environment can
adapters were previously needed for now be established for DR strategies,
both LAN and SAN can be integrated which were previously ignored because
by establishing a connection from the of formerly high running costs and the
server and the storage to FCoE, with cost-boosting distance between the
NIC and HBA being consolidated with sites.
CNA (Converged Network Adapter).
Management efficiency can also be
improved by using VLAN and VSAN in
virtualization. Furthermore, the 10
Gbps environment permits the develop-
ment of the virtual environment and
improves the performance of VMware
VMotion and SRM.
150
101.84
100
60.04 61.82
50
0
Data area 1 Data area 2 Log
Storage area
15
Disaster Recovery
in a Virtual Environment
VMware vCenter Site Recovery Manager (SRM)
automates disaster recovery
Failover in a virtual environment proper restoration was verified. SRM Mock disaster response, to test
works with NetApp’s storage – when recovery plans as often as desired
If the system must be shut down due to the SRM’s DR plan is implemented,
any disaster at the operation site, the SnapMirror is stopped automatically SRM includes a function to conduct
system should be failovered at the and the SRM starts the virtual machine drills to test the entire recovery pro-
recovery site to restart business opera- automatically by calling up the data cess. Using the Snapshot function of
tions immediately. However, this is very from the recovery volume, according to the storage, users can conduct drills as
difficult to do in conventional DR envi- the recovery plan prepared in advance. frequently as they want by connecting
ronments, because the server must be the virtual machine to an isolated test
connected with an external storage The results show that a fast and effec- network without interrupting the actual
device or the recovery process must tive failover can be successfully environment. Conventional DR systems
be carried out manually by an adminis- carried out under various environments involve a complicated process to carry
trator. There are many cases where the with a different number of servers at out recovery tests, and sometimes im-
recovery process remains untested, the operation site and recovery site. pact negatively on actual business op-
since much time and work are required RTO was about 1 hour, much less than erations. With such a conventional
to prepare an updated DR plan to meet the targeted time of 2 hours. This test system, many companies do not have
the daily demands of a changing sys- was implemented in an environment enough time to carry out recovery
tem environment. with a minimum number of virtual tests, and avoid them because of the
machines (two physical servers at the associated business risk. In some
To resolve these issues, VMware operation site and one at the recovery cases, additions or changes are made
vCenter Site Recovery Manager (here- site). The greater the number of serv- to the system after recovery test com-
inafter SRM), which automates disaster ers installed, the greater the benefits plications, making managers reluctant
recovery in a virtual environment, was from automation. Therefore, SRM may to conduct another recovery test. SRM
used in the demonstration tests to veri- be one of the most optimal solutions in lets users conduct automatic tests
fy both the seamless connection with environments where rapid disaster using the recovery plan at any time
the storage and the automation of the recovery is required. without interrupting on-site operations,
recovery process. SRM is a new
solution released in June 2008. The
demonstration tests were an excellent Operation site Recovery site
opportunity to test SRM’s performance Site Recovery Site Recovery
in a remote environment. vCenter Server vCenter Server
Manager Manager
16
while maintaining a highly reliable re- • Automatically replicating data store • Customizing recovery plan implemen-
covery environment. for recovery tation, in accordance with the test
• Implementing or stopping a user- scenario
In the demonstration tests, a clone defined script during the recovery • Cleaning up the test environment
volume of the recovery data saved in process automatically after test completion
the DR storage system was created • Reconfiguring the IP address of the
instantly, using NetApp’s FlexClone. virtual machine to match the network Disaster recovery management
The recovery plan and the test scenario configuration of the failover site • Detecting and displaying the virtual
were implemented in separate test • Managing and monitoring the machine to be protected using the
environments. After completion of the implementation of the recovery plan replication function of the storage
test, the test environment was com- using the vCenter Server device
pletely restored to original pre-test • Creating and managing the recovery
conditions. The test results were saved Achieving test with no downtime plan directly from the vCenter Server
in the vCenter Server, allowing for easy • Conducting recovery tests using the • Saving, displaying and exporting test
verification. SnapShot function of the storage results and failover implementation
device, without affecting data to be results from the vCenter Server
Major functions of the Site replicated
Recovery Manager • Connecting the virtual machine with a
separate network for testing
Automated failover purposes
• Running the recovery plan with a • Automatically conducting tests of the
single touch of a button using the recovery plan
VCenter Server
Figure 11: Button to run the recovery plan Figure 12: Screen to run the recovery plan
17
Disaster Recovery Potential Using
Virtualization Technology
Targeting effective use of the DR site
DR measures using virtualization Effective use of the DR site: tion from the operation site, or to use
technology that benefits from Improving the value of IT assets as a test instance employing the actual
an advanced technological business data by creating a replication
environment Future challenges for our research by FlexClone only of the data volume of
include finding ways to effectively use SQL Server out of the synchronized
Many results were obtained from this the DR site as part of daily business volume and attaching it to another SAP
project, proving that virtualization tech- operations. After all, it is especially instance. SAP Co-Innovation Lab Tokyo
nology is effective for DR measures difficult for small and medium-sized will continue to conduct demonstration
applied to an SAP ERP system. DR companies to obtain a return on invest- tests to examine issues springing from
measures using virtualization provide ment when installing IT assets that the needs of a variety of companies.
great benefits during operation, includ- would only be used for disaster recov-
ing quick recovery in case of a disaster, ery measures. Our recent demonstration tests cov-
and the ability to conduct mock disas- ered the process until failover at the
ter drills while operations at the actual In this respect, SAP Co-Innovation Lab DR site, and did not test failback to the
site continue running. These various Tokyo is studying a detailed scenario in operation site. In the future, the Lab will
advantages contribute to the reduction which a DR site could also be used test environments that are more realis-
of TCO. effectively during daily business opera- tic from a business perspective, taking
tions. For example, it may be possible into consideration failback procedures
These positive results are certainly to establish a development or test envi- that also include overseas sites.
options for SAP users who need to ronment using data replicated to the
ensure both business continuity and di- DR site, without stopping synchroniza-
saster recovery, and who have
depended on the reliability of a less ef-
ficient physical environment to Benefits Proven by Phase II of the Disaster Recovery Project
implement DR measures, or who have of SAP Co-Innovation Lab Tokyo
postponed the adoption of the DR
solutions because of the cost. Virtual- 1. A low-cost disaster recovery 4. I/O is integrated through FCoE
ization can also be applied to ERP in an platform can be established by 5. Speeding up replication for the DR
SaaS environment promoted by SAP, virtualization site, using Cisco’s WAN optimiza-
and in establishing a private cloud 2. Smooth failover using VMware’s tion solution.
environment achieved by advances in DR solution and NetApp’s storage 6. A DR site can be established even
virtualization technology. Virtualization technology was achieved with an inexpensive WAN, using
technology is also developing at the 3. An Intel® Xeon® 5500 series the WAAS WAN optimization
operational level, taking actual business processor improves performance solution
needs into consideration. in a virtual environment
18
502009
© xxx xxx
by (YY/MM)
SAP AG.
©20YY
All rightsbyreserved.
SAP AG.SAP, R/3, SAP NetWeaver, Duet, PartnerEdge,
All rights reserved.
ByDesign, SAP, R/3,
SAP Business SAP NetWeaver,
ByDesign, and other SAP Duet, PartnerEdge,
products and services
ByDesign, SAP
mentioned hereinBusiness ByDesign,
as well as and other
their respective SAPare
logos products and services
trademarks or regis-
mentioned
tered hereinof
trademarks asSAP
wellAG
as their respective
in Germany logoscountries.
and other are trademarks or
registered trademarks of SAP AG in Germany and other countries.
Business Objects and the Business Objects logo, BusinessObjects, Crystal
Business Crystal
Reports, ObjectsDecisions,
and the Business Objects logo,
Web Intelligence, BusinessObjects,
Xcelsius, and other Business
Crystal Reports,
Objects productsCrystal Decisions,
and services Web Intelligence,
mentioned herein as Xcelsius, andrespective
well as their other
Business
logos Objects products
are trademarks and services
or registered mentioned
trademarks hereinObjects
of Business as well S.A.
asthe
in their respective
United Stateslogos
and inare
othertrademarks
countries.orBusiness
registered trademarks
Objects of
is an SAP
Business Objects S.A. in the United States and in other countries.
company.
Business Objects is an SAP company.
All other product and service names mentioned are the trademarks of their
All other product
respective and service
companies. names mentioned
Data contained are the trademarks
in this document of their
serves informational
respectiveonly.
purposes companies.
NationalData contained
product in this document
specifications may vary.serves informational
purposes only. National product specifications may vary.
These materials are subject to change without notice. These materials
These
are materials
provided are subject
by SAP AG andtoitschange without
affiliated notice.(“SAP
companies TheseGroup”)
materialsfor
are provided by
informational SAP AGonly,
purposes andwithout
its affiliated companiesor(“SAP
representation Group”)
warranty of anyforkind,
informational
and SAP Group purposes only,
shall not without
be liable forrepresentation
errors or omissionsor warranty of any kind,
with respect to
and materials.
the SAP Group shall
The onlynot be liable for errors
warranties or omissions
SAP Group productswith
andr espect
services toare
the materials.
those that are The only warranties
set forth in the expressfor SAP Group
warranty products and
statements services are
accompanying
thoseproducts
such that are set
andforth in theifexpress
services, warranty
any. Nothing s tatements
herein should beaccompanying
construed as
such products
constituting an and services,
additional if any. Nothing herein should be construed as
warranty.
constituting an additional warranty.
www.sap.com /contactsap