Professional Documents
Culture Documents
Oracle VM 3: Planning Storage For Oracle VM Centric DR Using Site Guard
Oracle VM 3: Planning Storage For Oracle VM Centric DR Using Site Guard
Introduction 1
Best Practices for Storage Arrays 2
Access to storage APIs is required 2
Understanding requirements for storage replication 2
Replicated storage must always remain unavailable 2
Reverse replication relationship during site transition 3
Delete unused storage repositories 4
Configuring ZFS storage appliances for storage replication 5
Best Practices for Storage Protocols 6
Both SAN and NAS are supported 6
File level protocol is the most flexible for DR 6
Block level protocols can be challenging 6
Best Practices for Pool File Systems 7
Do not replicate pool file systems 7
Back up pool file systems 7
Keep Pool File Systems in separate volumes/projects 7
Non-clustered server pools 8
Best Practices for Storage Repositories 9
Organizing repositories 9
Use a descriptive naming scheme 9
Organize repositories by business system 10
Limitations of simple names with Enterprise Manager 11
Use a modular approach for storage repositories 11
Site Guard operation plans operate on storage repositories 11
Group related storage by volume (ZFS project) 12
Excluding Oracle VM guests from disaster recovery 13
Don’t deploy single-use repositories 14
Keep Oracle VM guest objects in the same repository 14
Virtual disks Vs physical disks for booting guests 15
This paper explains concepts, best practices and requirements needed to plan and implement storage
for your DR environment specifically for our Oracle VM Centric DR using Oracle Site Guard to
orchestrate failovers and switchovers . Detailed steps for installing or configuring storage are beyond
the scope or purpose of this guide.
We chose to begin w ith the Oracle ZFS Storage Appliance since it gracefully handles reversing remote replication
relationships between sites and block level protocols are not at all challenging. This is the only storage platform w e
have found so far that replicates the page83 ID betw een sites, which alleviates the need for automating complex
logic, maintaining flat files to track relationships of devic e special file names betw een sites and modif ying the vm.cfg
file during every site transition.
This solution path automates the entire failover or sw itchover process to decrease the likelihood of human error and
provide a quic ker time to recovery. If operational governance within your organization prohibits access to storage by
Site Guard, then you should instead follow our Solution Path 2 Oracle VM Centric DR using custom automation.
You w ill need to configure Enterprise Manager with the credentials for root on all ZFS Storage Appliances at all
sites. Alternatively , you can create a non-root account on each of the ZFS appliances as long as the user account
has root equivalent authorization to perform all storage replication and share management required by the Site
Guard agent
Please refer to white paper SN21004: Implementing Oracle VM Centric DR using Site Guard for specif ic steps
needed to add ZFS Storage Appliance credentials to Enterprise Manager.
Site Guard will automatically ensure all of the following requir ements are met at the appropriate time during
switchovers and failovers when using Oracle ZFS storage appliances. However, your storage administrator will
need to understand the following concepts when configuring storage replication to ensure site transitions are
successful using Site Guard. These requirements are not optional.
The diagram shown in Figure 1 below illustrates the state storage replication must be in for the successful day-to-
day operation of Oracle VM. The left hand side of the illustration shows storage in active use at a primary SiteA.
The SAN and NFS storage is presented to the SiteA Oracle VM servers for use as storage repositories or physical
disks passed directly to the Oracle VM guests. Any NFS storage mounted directly on the Oracle VM guest
operating systems are als o exported to the Oracle VM guests running on the Oracle VM servers at SiteA Pool1.
SiteA Available to OVM servers LUNs passed to VMs LUNs passed to VMs Unavailable to OVM servers SiteB
Pool1 NFS for repositories NFS for repositories Pool1
Available to OVM guests NFS passed to VMs NFS passed to VMs Unavailable to OVM guests
SiteA Available to OVM servers LUNs passed to VMs LUNs passed to VMs Unavailable to OVM servers SiteB
Pool2 NFS for repositories NFS for repositories Pool2
Available to OVM guests NFS passed to VMs NFS passed to VMs Unavailable to OVM guests
The two boxes in the middle represent the ZFS appliances at SiteA and SiteB. Notice the ZFS projects for SiteA
Pool1 and Pool2 are replicated to the ZFS appliance residing at SiteB. The data refreshes from SiteA to SiteB can
be either scheduled or continuous depending on your requirements. The key concept shown on the far right of the
illustration shows replicated storage is not presented (neither mapped nor exported) to the Oracle VM servers or
guests at SiteB – not even as read-only.
This is not an issue when using Oracle ZFS appliances for your storage platform since remote replication does not
allow NFS exports or LUN mapping/presentation to servers as long as a ZFS project is an active. However, if you
are using another storage platform, then you need to ensure the replication model follows the example show in
Figure 1 above.
To illustrate this point, Figure 2 below shows the same model as Figure 1 above. However, notice the replication
relationship has been reversed; this is indic ated where the red text indicates a Key Concept showing tw o large
arrows between the ZFS appliances in the middle of the illustration. The other key concept shown on the left of the
diagram indicates that storage is now replicated from SiteB to SiteA but is not presented or exported to the SiteA
Oracle VM servers or guests. If you compare both Figure 1 and Figure 2 you will see that roles of the sites have
been completely reversed.
SiteA Unavailable to OVM servers LUNs passed to VMs LUNs passed to VMs Available to OVM servers SiteB
Pool1 NFS for repositories NFS for repositories Pool1
Unavailable to OVM guests NFS passed to VMs NFS passed to VMs Available to OVM guests
SiteA Unavailable to OVM servers LUNs passed to VMs LUNs passed to VMs Available to OVM servers SiteB
Pool2 NFS for repositories NFS for repositories Pool2
Unavailable to OVM guests NFS passed to VMs Key Concept NFS passed to VMs Available to OVM guests
There is a slight difference in the timing depending on the type of site transition. Site Guard automatically reverses
the replic ation relationship as one of the last tasks when vacating the primary SiteA. It is very important that the
relationship is reversed prior to deleting any storage objects from the Manager if you are using
Failovers are different because Site Guard has to assume the primary SiteA is completely inaccessible due to
catastrophic failure. Therefore, Site Guard will ignore errors during its attempts to delete unused storage, storage
repositories and Oracle VM guests from the SiteA Oracle VM Manager during the failover.
The process for a failback als o differs slightly from a switchback since Site Guard assumes the storage at SiteA was
not vacated correctly during a failover, so it attempts to clean up SiteA before vacating SiteB. This means Site
Guard attempts to delete the old, empty SiteA ZFS projects that are no longer relevant as well as the artif acts of the
old storage repositories and Oracle VM guests, then begins the normal failback process; from this point, the process
is essentially a switchover from SiteB back to SiteA.
SiteA Pool SiteA ZFS Appliance SiteB ZFS Appliance SiteB Pool
SiteA Pool1 replica Replicate to SiteB SiteA Pool1 replica
SiteA Repository1 Presented LUN for repository1 LUNs for repositories Unavailable
SiteA Repository2 Presented LUN for repository2 LUNs passed to VMs Unavailable
SiteA Repository3 Presented NFS for repository3 NFS for repositories Unavailable
SiteA Repository4 Presented NFS for repository4 NFS passed to VMs Unavailable
Key Concepts
Figure 3: Ensure all storage repositories are deleted f rom primary site during a switchov er
Notice in Figure 3 above that the storage repositories are not yet available to the SiteB server pool. As noted in the
previous subsection, the replicated repositories are not made available to the SiteB Oracle VM servers until the
replication relationship has been reversed.
SiteA Pool SiteA ZFS Appliance SiteB ZFS Appliance SiteB Pool
SiteA Pool1 replica Replicate to SiteA SiteA Pool1 replica
Unavailable LUN for repository1 LUNs for repositories Presented SiteA Repository1
Unavailable LUN for repository2 LUNs passed to VMs Presented SiteA Repository2
Unavailable NFS for repository3 NFS for repositories Presented SiteA Repository3
Unavailable NFS for repository4 NFS passed to VMs Presented SiteA Repository4
Key Concepts
Figure 4: Ensure all storage repositories are deleted f rom primary site during a switchov er
Deleting the storage repositories is an important step in the switchover process since it helps avoid performance
problems and potential mistakes from human error. Signif icant performance problems aris e when the artif acts of
storage repositories remain after a sw itchover and are still presented to servers in a pool. The agent on the Oracle
VM servers will attempt to mount the storage repositories during periodic storage refreshes throughout the day. The
storage refresh locks the Oracle VM Manager until the process times out waiting to mount the repositories whic h in
turn prevents systems administrators from doing any work in the Oracle VM Manager.
The illustration in Figure 5 below shows the iSCSI initiator and target groups names are exactly the same at all sites.
The group names shown in the illustrations below are just examples; the names are completely up to your unique
requirements as long as it is the names are the same at all DR sites.
Even though the group names are the same at all sites, the actual iSCSI IQNs for the iSCSI initiators on the Oracle
VM servers and ZFS appliances are different at each site; this means the replicated LUNs will be presented to the
correct servers at each site automatically when Site Guard reverses replication relationship during a failover or
switchover. This is the magic.
Figure 5: iSCSI target and imitator group names must identical at all sites if y ou are using iSCSI SAN instead of NFS
Figure 6: Fibre Channel target and imitator group names must identical at all sites if you are using FCP SAN instead of NFS
The target and initiator group names are the most important concept to understand – this is a requir ement. Ensuring
the group names are the same for all sites w ill be challenging if you are using ZFS appliances that are alr eady in
use.
There are a few more things you need to configure on the ZFS appliances other than the group names. Please refer
to white paper SN21004: Implementing Oracle VM Centric DR using Site Guard for specif ic steps needed to
prepare the ZFS Storage Appliances for your DR envir onment.
You can use FCP, iSCSI and NFS or a combination of all three storage protocols for any Oracle VM centric dis aster
recovery solution. However, you will find that NFS is the easiest to work with in terms of transitioning storage
between sites. It is easier to forcefully take ownership of NFS storage repositories during a failover and there is less
challenge for the Oracle VM guests to reestablish access with NFS file systems at the partner site.
The real challenge with block level protocols comes w ith SAN LUNs passed directly to Oracle VM guests, you have
to change the device special file worldwide ID in each virtual machine configuration file after a transition from one
site to another (WWID, also known as the page83 ID).
Pool file systems are never replicated to alternate sites and are never part of any dis aster recovery solution using
Oracle VM. To illustrate the problem using ZFS as an example, Site Guard reverses storage replic ation of entire
ZFS Projects during a switchover or failover. This action makes an entire ZFS project (volume) and everything
contained in the project completely unavailable to the Oracle VM Manager and servers at the DR site being vacated
during a switchover or failover.
Therefore, all servers in a server pool at the DR site being vacated will reboot and then be unable to run any vir tual
machines after rebooting if your pool file system(s) reside in any ZFS Projects containing storage repositories that
are transitioned to other sites.
Pool file systems should reside in ZFS projects that are not replicated and do not contain any storage that needs to
be transitioned to other DR sites. Notice in Figure 7 below that there are tw o server pools represented by the boxes
on the left hand side of the diagram and a ZFS appliance on the right. Each of the server pools have a pool file
system that resides in a dedic ated ZFS project with a single share for the pool file system.
You can use any supported storage protocol for pool file systems; in this case SiteA Pool1 uses an NFS share
contained in ZFS project SiteA_Pool1_infra and SiteA Pool2 uses a SAN share contained in ZFS project
SiteA_Pool2_infra. Als o notice that neither of the ZFS projects containing pool file systems are not replicated to
other DR sites.
The ZFS project can contain other shares that are not part of a DR plan, but it is recommended that you do not
include shares for storage repositories in any ZFS projects containing pool file systems for the follow ing two
reasons:
» Keeping pool file systems separate from other shares ensures server pools are not brought down simply because
a runaway process on an Oracle VM server or guest inadvertently fills 100% of the space allocated to a ZFS
project.
» Keeping non-DR storage repositories in their own projects allows you to easily include those projects into a Site
Guard operation plan at any time in the future w ithout dis rupting anything or involving outages to make the
change
Non-clustered server pools will reduce the single point of failure represented by the pool file system, reduce the
amount of storage allocated to overhead and eliminate the need to back up the pool file system. However, non-
clustered server pools do not support OCFS2 storage repositories nor any high availability features of Oracle VM.
You may deploy both clustered and non-clustered server pools in the same DR environment. Site Guard doesn’t
care if you are transitioning storage repositories from one server pool type to another as long as an OCFS2
repository is not being transitioned to a non-clustered server pool.
This is another reason we recommend NFS over SAN for storage repositories (see section on storage protocols
above). The choic e to deploy non-clustered server pools is up to you and the unique requirements for each server
pool.
Organizing repositories
It is fine to have many storage repositories, but you should organiz e your business systems into repositories
dedicated to particular business systems. For example, if you have three different business systems such as BAM,
CRM and SCM, then create repositories meant to contain only the virtual machines for BAM, repositories meant only
for CRM and repositories meant only for SCM.
How ever, we highly recommend that you alw ays incorporate the primary site name into each repository name. This
is because the simple name w ill automatically appear when a repository is discovered at an alternate site, which in
turn makes it immediately obvious to systems administrators that there are foreign Oracle VM guests running at an
alternate site as well as which site they came from; this dramatically reduces the chance for confusion and human
error.
For example, perhaps you have a customer sales portal that is compris ed of three Oracle VM guests running data
stores, two virtual machines running middlew are, three vir tual machines as applic ation s ervers and four VMs acting
as web portals with load balancing. These would all be part of a single business system. That single business
system should be contained in as many repositories as needed, but only Oracle VM guests related to the business
system reside in the repositories.
As shown in Figure 9 below, you might further divide these into development, user acceptance testing and
production systems. There are a few reasons this kind of division into discrete business systems makes a lot of
sense:
» You don’t have to transition an entire server pool to another site as a single unit. Notice in Figure 9 below that the
repositories in each of the tw o server pools are independent of each other since they are organized by business
system; this means each repository can be freely transitioned to a different server pool at a different site.
» You have the complete freedom to transition a single business system while leaving the others running at the
primary site. For example, transition the production sales support system running on SiteA Pool2 to another pool
at SiteD
» You can promote entire business systems from development into production by using Site Guard to transition the
system to another server pool at the same site or to another server pool at a different site
» Backups are easier because all of the VMs in the repositories can be quiesced at the same time w ithout capturing
data from other unrelated VMs that can’t be put into a transaction consis tent state at the same time
» Restores are easier because it is highly likely that all VMs w ill need to be restored at the same time; restoring
everything in a repository without worrying about overwriting unrelated VMs that don’t need to be restored
SiteA Pool1 SiteA Pool2
SiteA Dev Corp Procurement Alternate DR site is SiteB SiteA Prod Corp Procurement Alternate DR site is SiteD
SiteA Dev Customer Support Alternate DR site is SiteB SiteA Prod Customer Support Alternate DR site is SiteD
SiteA Dev Data Center Ops Alternate DR site is SiteC SiteA Prod Data Center Ops Alternate DR site is SiteC
SiteA Dev Sales Support Alternate DR site is SiteC SiteA Prod Sales Support Alternate DR site is SiteD
Perhaps your company provides priv ate and public cloud services. In this case you might organiz e your Oracle VM
guests into storage repositories by customer and then business systems for each customer as shown it Figure 10
below.
SiteA Dev Customer1 CRM1 Alternate DR site is SiteB SiteA Prod Customer1 CRM1 Alternate DR site is SiteD
SiteA Dev Customer1 CRM2 Alternate DR site is SiteB SiteA Prod Customer1 CRM2 Alternate DR site is SiteD
SiteA Dev Customer2 CRM1 Alternate DR site is SiteC SiteA Prod Customer2 CRM1 Alternate DR site is SiteD
SiteA Dev Customer2 CRM2 Alternate DR site is SiteC SiteA Prod Customer2 CRM2 Alternate DR site is SiteE
SiteA Dev Customer2 ERP Alternate DR site is SiteC SiteA Prod Customer2 ERP Alternate DR site is SiteE
Figure 10: Organize Oracle VM guests into repositories by customer, then business system
The above two examples are simply starting points to help illustrate the necessity of designing your storage in a
robust, modular fashion for maximum flexibility and easy identif ic ation. In this way, you can easily change the
alternate DR site of any single business system, or group of Oracle VM guests by simply changing a single
parameter in a Site Guard operation plan.
Using Enterprise Manager to manage Oracle VM across the various DR sites is recommended, but optional.
SiteC Pool1
SiteA Pool2
SiteA Pool2 Repository1 NFS
Figure 11: Design each DR operation plan around storage repositories that will be transitioned as a unit
The example illustrated above is modular in nature, allowing a lot of latitude in the way you organize and execute
Site Guard operation plans. For example, the storage repositories above can be organiz ed into different Site Guard
operation plans in a few different ways.
» You could create a single Site Guard operation plan that transitions all of repositories to the different sites shown
in Figure 11 above concurrently
» You could create three individual Site Guard operation plans that include:
» Operation plan 1: SiteA Pool1 Repository1, Repository2 and Repository3 transitioned to two different
server pools at SiteB
» Operation plan 2: SiteA Pool1 Repository4 and SiteA Pool2 Repository1 transitioned to SiteC
» Operation plan 3: SiteA Pool2 Repository2, Repository3 and Repository4 transitioned to two different
server pools at SiteD
» Or, you could create five individual Site Guard operation plans that include:
» Operation plan 1: SiteA Pool1 Repository1 and Repository2 transitioned to Pool1 at SiteB
» Operation plan 2: SiteA Pool1 Repository3 transitioned to Pool2 at SiteB
» Operation plan 3: SiteA Pool1 Repository4 and SiteA Pool2 Repository1 transitioned to Pool1 at SiteC
» Operation plan 4: SiteA Pool2 Repository2 transitioned to Pool1 at SiteD
» Operation plan 5: SiteA Pool2 Repository3 and Repository4 transitioned to Pool2 at SiteD
These fictional Site Guard operation plans can be executed independently of each other at different times on
different days, or all at the same time. Individual operation plans can even be combined together in any order to
transition a subset of storage repositories. Using Figure 11 above as an example, Site Guard would allow you to
transition just the two SAN repositories in SiteA Pool1 and the two NFS repositories in SiteA Poo2 while leaving all
the others running at SiteA.
You simply need to do tw o things when deploying the solution to exclude Oracle VM guests in a server pool from
being part of a Site Guard operation plan:
» Ensure all storage associated with the Oracle VM guests contained in either of the storage repositories reside in
ZFS projects or volumes separate from other projects that are part of a DR plan. Notice in Figure 13 above that
Such a solution adds a signif icant amount of unnecessary overhead during discovery at the alternate DR site
seriously degrading performance of Oracle VM and signific antly increasing the overall time to recovery.
Figure 14 below shows an example of a very poor deployment of files associated w ith Oracle VM guests. Notice
that each of the two Oracle VM guests named m yguest1 and myguest2 have configuration files and virtual disks
spread across all four storage repositories.
SiteA Pool1
SiteA Pool1 Repository1 NFS SiteA Pool1 Repository2 NFS SiteA Pool1 Repository3 SAN SiteA Pool1 Repository4 SAN
vm.cfg for myguest1 vdisk2 for myguest1 vdisk3 for myguest1 vdisk4 for myguest1
vdisk1 for myguest1 vdisk2 for myguest2 vdisk3 for myguest2 vdisk4 for myguest2
Figure 14: Very poor deploy ment of v irtual disks and conf iguration f iles f or Oracle VM guests
Figure 15 below illustrates how files for Oracle VM guests should be deployed. Notice that the configuration files
and virtual disks all reside in the same repository. As we noted above, you can have many storage repositories for a
single business system comprised of many vir tual machines; just ensure that all files associated with each virtual
machine all reside in the same repository.
Figure 15: Good deployment of v irtual disks and conf iguration f iles f or Oracle VM guests
How ever, you need to be consis tent in your choice for ease of maintenance and reliability of your solution. Pick one
disk type over the other and then use that dis k type throughout your entire DR environment; consistency is the key
to success.
C ON N EC T W IT H U S
Blogs.oracle.com/virtualization Copyright © 2015, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the
contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
Facebook.com/OracleVirtualization fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
Twitter.com/ORCL_Virtualize means, electronic or mechanical, for any purpose, without our prior written permission.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
oracle.com
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and
are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are
trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0915