Professional Documents
Culture Documents
RMM v7 Replication Process and Best Practices v2.7
RMM v7 Replication Process and Best Practices v2.7
TECHNICAL OVERVIEW
&
BEST PRACTICES
RACKWARE INC
VERSION 2.7
1 Overview ............................................................................................................................................... 3
2 Basic Replication ................................................................................................................................... 4
2.1 Discover/Examine ......................................................................................................................... 4
2.2 Capture for Store-and-Forward Approach .................................................................................... 4
2.3 Assign ............................................................................................................................................ 5
2.4 Direct Assign.................................................................................................................................. 7
2.5 Sync ............................................................................................................................................... 7
3 Replication Planning.............................................................................................................................. 8
3.1 Location and Number of RMMs .................................................................................................... 8
3.1.1 Location of the RMM Server ................................................................................................. 8
3.1.2 Number of RMM Servers ...................................................................................................... 8
3.2 Key Decisions Regarding RMM Configuration .............................................................................. 9
3.2.1 Storage .................................................................................................................................. 9
3.2.2 Networking............................................................................................................................ 9
3.3 Key Decisions Regarding Server and Infrastructure Configuration............................................... 9
3.3.1 IP Addresses and Hostnames ................................................................................................ 9
3.3.2 Isolate or Integrate the Network ........................................................................................ 10
3.3.3 Licensing - Application and OS ............................................................................................ 12
3.3.4 Anti-Virus Considerations ................................................................................................... 12
3.4 Special Considerations for Databases ......................................................................................... 12
3.4.1 Oracle Databases ................................................................................................................ 12
3.4.1.1 Oracle Databases with LVM Volumes ............................................................................. 12
3.4.1.2 Oracle Databases with ASM Disks................................................................................... 12
3.4.1.3 Oracle Data Guard (and Golden Gate) ............................................................................ 13
3.4.2 Microsoft SQL Server .......................................................................................................... 13
3.5 Clusters........................................................................................................................................ 13
3.6 Active Directory .......................................................................................................................... 15
3.7 Common Issues ........................................................................................................................... 18
3.7.1 Windows Software Update ................................................................................................. 20
3.7.2 Windows Reactivation ........................................................................................................ 20
All use cases rely on a common underpinning technology, a true any-to-any Image replication and sync
mechanism. Of course, a great deal of additional software surrounds the Image replication technology
to enable these use cases.
Virtually any confluence of server to server is supported. The RMM supports physical servers on both
the Origin side and Target side. Support for disparate hypervisors in the replication process is also a key
capability. For example, one can replicate from a Dell server to an HP server. Or from a VMware VM to
a Xen or KVM VM, even if in a Cloud. This is a crucial feature as many clouds are run on non-VMware
hypervisors, while many datacenters include VMware hypervisors as the primary virtualization vehicle.
RackWare technology also supports Virtual Machine (VM) back to physical, which is critical for Disaster
Recovery during a fallback operation, perhaps from cloud (Virtual) to a physical server in the datacenter.
The RMM is a Linux application and runs on a RedHat, CentOS, or Oracle Linux OS. The RMM does not
require any interface to a hypervisor or storage array. The RMM connects to the Origin server over the
network at the Operating System level and replicates the server Image. The Image can be replicated to
a storage location at the Target site or to a provisioned server (virtual or physical) at the Target site. The
RMM is also capable of auto-provisioning appropriately sized servers in the Target environment.
A Configuration Management Data Base (CMDB) is maintained by the RMM on the RMM Server. The
CMDB keeps track of resources the RMM is managing, optionally captured Images, as well as
operational state.
2.1 Discover/Examine
The Discover/Examine phase is relatively simple and straightforward. The user inputs the IP address or
DNS hostname of the server to be replicated and the RMM finds it and connects to it. The server is
called a Host in RackWare nomenclature (Host in this context does not refer to a hypervisor server).
Commands can be driven by CLI, GUI and API. The CLI command is called host discover.
The RMM connects to the Host via the supplied IP address or DNS hostname and gathers information
about the Host. The RMM uses SSH to connect to the server. So, a prerequisite is that the RMM
server's SSH key is installed on the Origin server. SSH is used for both Windows and Linux alleviating the
need for any password. If SSH is not already installed on Windows a tiny MSI is used to setup the SSH
software.
If the SSH keys and/or username/passwords are not coordinated correctly, the Discover process will fail,
and the administrator will be notified.
Once the Host is found, a series of inquiries are made to the Host using standard Operating System
queries. Information about the Host is logged in the RMMs CMDB. The discovered information,
metadata, will be used later in the replication process to AutoProvision a server (physical or virtual) at
the Target site.
The Capture process can be initiated on Hosts that have previously been discovered, or on Hosts that
the RMM has no awareness of. The capture process essentially creates a clone or snapshot of the Image
at the Target site. (If desired the Capture process can be skipped and Images can be replicated directly
to target machines.) As part of Capture, the RMM determines the necessary storage requirements,
which can be influenced by capture parameters, and allocates storage for the capture.
The RMM ensures data integrity in both the capture and eventually sync processes. The RMM takes a
snapshot of the Logical Volumes on the Origin server and copies the Image bits from the snapshot to the
storage location associated with the RMM Server at the Target site. (See the diagram in the next
section.) The snapshot is a standard OS mechanism whereby the OS informs the application to flush
their IOs to disk. When the IOs are flushed to disk, data for the filesystem is in a static and consistent
state. Next the OS places a bookmark in the filesystem and IOs then continue above the bookmark so
the process is non-disruptive to the Origin server. The RMM performs the copy (or delta sync) on the
This is a highly reliable process for any application using logical volumes and is OS compliant. It works
exceptionally well for databases including Microsoft SQL Server and Oracle configured with LVM
volumes. In fact, these databases are particularly reliable and well behaved resulting in clean Images on
the Target environment.
In addition to the Image bits, additional metadata about the Image is stored in the RMM's CMDB. The
metadata is vital to configuring Images in the Target environment during the final assign operation.
When an Image is assigned in the Target environment, the metadata drives configuration of crucial
elements necessary for the Image to operate correctly on the new hardware and in the new
environment.
Another important point about the capture process is that it is file based, and not block based. This
means that the RMM is smart enough to copy only the used data, not simply copying sectors from one
disk to another. This is a far more efficient process and has the added advantage that the Target storage
can be right sized if desired and permits other important features such as support for network storage.
2.3 Assign
To complete the replication process, an Assign operation is performed. In the Assign process the RMM
will first AutoProvision an appropriately sized server, although, optionally, the user can provision the VM
if desired. The RMM uses the metadata from the Discovery process to employ an algorithm to select a
server with the correct hardware profile Target site so the workload runs equal or very similar in
performance to the Origin server.
If the target server is an existing physical server or the user elected to provision the VM themselves, the
RMM can be pointed that IP address to it to complete the replication process.
The RMM connects to the Target server. For Linux the connection is always SSH. For Windows the
connection method can be SSH or winexe depending on the capabilities of the Cloud provisioning
mechanism. The Target server is examined to ensure it is capable of running the Origin Image. The
RMM also understand the underlying hardware so it knows how to configure the Origin Image to the
new hardware (e.g. - device drivers). Next the Target server is prepared to accept the Image of the
Origin server.
Server Server
Discover
RMM
Standard OS Linux & Windows use ssh
queries
Runtime OS Capture
CMDB
Live IOs Snapshot
Apps still running
Metadata
Format Disk
FileSystem Provision VM
bookmark Static Encrypted Xfer Data / FileSystem Create partitions
Snapshot From static snapshot Captured Xfer Data
Populate data
Syncs to Storage Image
(Flushed IOs)
Inject Drivers Configure Drivers
• Non-disruptive Config. Network
• Exact replica except Boot
– Drivers
– Optionally networking
The RMM boots the Target into a RackWare microkernel, and the disk is reformatted, recreating the
Logical Volume structure exactly as the Origin is configured. Importantly, the number of disks can be
different; the RMM employs a best-fit algorithm. Additionally, since the RMM replicates at the
file/filesystem level the storage on the Target, it does not have to be the exact size as the Origin; it
simply needs to be large enough to hold the used data. Optionally the user can specify the partition
map if desired.
Once the disk is formatted with identical partition structure of the Origin the RMM transfers the Image
bits to the Target. If necessary, drivers are injected to ensure that the Image can operate on the new
hardware, and optionally the network identify is modified to be consistent with the Target environment.
Upon a reboot, an exact replica of the Origin server is now running on the Target with the exception of
device drivers and optionally network configuration. The OS is the exact version as the Origin, all
applications, application data, user, packages, settings, and other data is identical.
Note that for Windows, once the Origin Image has been applied to the Target server the RMM will only
use SSH to connect to it. It will not use winexe as was used to the newly provisioned server.
The above tests demonstrate that the origin was replicated to the target hardware with:
• Configuration of applicable systems and storage driver
• Maintaining user and password configuration from the origin
• Successful booting of the server
• Configuration and operation of standard network connectivity of the running server
There are many advantages to performing a store and forward replication. However, in circumstances
where those advantages are not relevant, the RMM can perform a replication directly from the Origin
server the Target server. This capability is called Flex Sync or Host Sync; these terms are used
interchangeable.
With Host Sync, the RMM performs all the same functions on the origin server from a discover
perspective. After Discover/Examine, the RMM provisions a target server if necessary or an existing
server can be used (e.g. - an existing physical server). Once the Target server is prepared, instead of
Capturing the Image on the RMM Server, the Image is replicated directly to the Target server. The
network connection can still be through the RMM avoiding a direct network connection between the 2
servers. This can be thought of as a Sync with provisioning or a combined Capture/Assign.
Server Server
Discover RMM
Standard OS Linux & Windows use ssh
queries
Runtime OS Capture
CMDB
Live IOs Snapshot
Apps still running
Metadata
Format Disk
FileSystem Provision VM
bookmark Static Encrypted Xfer Data / FileSystem Create Volumes
Snapshot
Logical From static snapshot
Populate data
Volumes
(Flushed IOs)
Inject Drivers Configure Drivers
• Non-disruptive
Config. Network
• Exact replica except Boot
– Drivers
– Optionally networking
2.5 Sync
Another key capability of the RMM is the ability to perform a delta sync from Origin to Target. There are
numerous options for sync including:
• Stage I
• Stage II
• RMM passthrough
• Direct
• Selective Sync
• Drive/directory mapping
Information about these options can be found in the Migration and DR Guide documents.
Technically the RMM Server can be located in any geography or location as long as it has network
connectivity to both the Origin and Target servers with the appropriate ports opened and protocols
permitted (see the Prerequisites and Operational Requirements document).
Under most circumstances the RMM Server is best situated in the Target environment and is considered
a best practice. There are several reasons for this.
First, the Target environment is often a cloud, and it is easy and convenient to spin up a VM to run the
RMM Server.
Second, locality of data is important. Since a WAN link normally sits between the origin and target, with
a lower speed and/or greater distance (increasing latency), capturing the Image and having the data
closer to the Target server is an advantage.
The Assign process is more complicated than Capture and it is more efficient to have the Image data
closer to the Target server where the two are connected by a LAN.
Of course, each situation is different and other factors may warrant installing the RMM on the Origin
side. For example, if an Origin datacenter will be replicated to multiple Target locations, it may be
quicker and more efficient to install a single RMM Server on the Origin site.
There are also cases where installing the RMM Server at both locations is the better choice. For
example, where the Origin holds very large amounts of data, and the network connection is slower
speed or unstable, the RMM can be installed at both locations. The RMM installed at the Origin would
capture the Image locally, over the LAN, to some kind of portable storage. Once completed, the disks
can be physically transported to the Target and imported by the RMM Server at that location to
complete the replication process. The RMM Server at the Origin can be decommissioned, and the RMM
Server at the Target can still be used to perform sync operations over the network.
An advantage of RackWare is that licensing is on a server basis and as many RMM Servers can be spun
up and used concurrently as desired.
It may be convenient to deploy an RMM in each geography, on individual physical LAN segments, per
migration team, or per some logical organization structure. RMMs can be spun up to meet the required
parallelism for aggressive projects.
After the initial replication, some of the RMMs can be decommissioned. For the remaining functions
such as cut over Syncs, and DR policies, the remaining RMMs can be used function to perform
operations on replications even if not done by that RMM.
The RMM uses storage on the RMM Server to perform store-and-forward replication operations. In
migration use cases, the storage required there is temporary and can be removed after replication is
completed. DR use cases usually use RMM based storage.
Adequate storage performance is necessary in order for the replication operations to complete in a
timely manner. Guidelines can be found the Prerequisites and Operational Requirements document.
3.2.2 Networking
When configuring RMM is it best practice to configure 1GbE (or higher) interfaces to local resources.
It is best practice to configure a minimum of 100Mbit WAN connections between the Origin and Target
environments. While 100Mbit is a best practice there are many cases where slower speed links are
acceptable. In all cases, a stable and consistent network is actually more important than bandwidth.
Correct application operations require that IP addresses and/or hostnames be correct in the
environment they are running. Therefore decisions about IP addresses and hostnames impact the effort
required to test and verify applications in a target environment. Some applications are more sensitive
to IP addresses, and some applications are more sensitive to hostnames. If applications are more
sensitive to hostnames, then there is more freedom in making choices about IP addresses in the Target
environment. By default the RMM retains the hostname, and changes the IP addresses on the
replicated server in the Target; though both can be configured either way.
As a general rule, it's typically easier for applications to work properly in the Target environment if IP
addresses are kept the same as the Origin environment. However, this complicates the network
topology and configuration. Additionally, there are circumstances where servers in the Target
environment require access to servers in the Origin environment.
If Origin IP addresses are retained as part of the replication process it's imperative that the Target
network be isolated form the Origin network, lest duplicate IP addresses be introduced. See the section
on "Isolate or Integrate the Network". Duplicate IP addresses will confuse many infrastructure
components such as routers, load balancers, and Domain Controllers. Careful planning and systematic
A common configuration is to retain Origin IPs on one interface and changed IPs on a second interface.
This can happen when one interface is using a private IP and dedicated for application use, and a second
interface is used exclusively for management functions. When moving to a cloud environment, its useful
(or required) to change the management IP as the management network is different from that of the
Origin. At the same time, keeping the private IP on the interface used by applications eases or
eliminates any application issues as part of the replication process.
It's also possible to configure a specific network configuration for a captured Image, and that
configuration is used when completing the replication process.
The RMM has the ability to retain the hostname (the default) or the administrator can configure a
new/different hostname as part of the replication process.
Many factors affect whether or not the networks should be isolated or integrated. It is a best practice to
isolate the DR site from the Origin site though it's acceptable to have some special servers connect to
both environments.
If IP addresses are required to be the same, it is commensurately required that the networks be isolated.
There are multiple ways to configure isolated networks and allow the same IPs. The below diagram
shows two methods, one provided by RackWare, and one provided by the user or service provider.
Cloud RackWare
Legend Infrastructure
Client VMs
Managed
If the IP addresses are different it's still advised to isolate the networks. With different IP addresses the
NAT function can still be useful but is not necessary.
In the event of a failover event, removing the isolation works in conjunction with other infrastructure
that must be modified to point to the Target servers. This is often as simple as updating URLs to point to
a different IP address or a redirect in a DNS server.
One of the major issues with not isolating the DR site is that servers replicated in the Target
environment can connect back to servers in the Origin environment. A significant risk is when a Target
server connects back to an Active Directory server in the Origin environment. This will confuse the AD
function as to which server is the production server and likely cause an outage. Hence, if the networks
are not isolated a multi-pronged approach should be employed to assure the Target server. The
following mechanisms have successfully been used to address integrated networks and assure Target
servers do not connect to the Origin AD server.
The RMM supports modifying the hostname as part of a replication. However, it is highly advised that
all 3 mechanisms be employed. Modifying the hostname requires human intervention and in a large
project it's a matter of time before someone forgets to specify that options or does not update the
hostname correctly. If that happens additional layers will catch the Target server from connecting to the
Origin.
It is considered best practice to isolate the networks although it's noted that many circumstances
require exceptions. Whether or not a decision is made to isolate or integrate the network, taking care
By design, RackWare does not manipulate any licenses whether they be OS or application. A licensing
plan should be part of any replication activities.
The RMM performs a full Image replication and allows Images to be replicated to disparate hardware
and across disparate hypervisors. This means that the Image is then configured on different hardware
than was in the Origin environment.
Some licensing is sensitive to CPU configurations. While every effort is made in the RMM to provide
commensurate hardware (virtual or physical) it's possible that differences may arise requiring
adjustments
Likewise, some licensing is sensitive to hardware GUIDs or MAC addresses. In such cases these elements
may be different, especially MAC addresses.
RackWare works in conjunction with all anti-virus programs. However, the RackWare processes may
invoke protective mechanism by some anti-virus software. The RackWare processes usually completes,
but may take extremely longer. See the RackWare Prerequisites and Operational Requirements.
In the majority of cases the default RackWare process will be sufficient for database replication,
migration and DR. However, databases often require special consideration given their potential size,
update rate, and possible distributed configuration.
The RMM supports Oracle databases with LVM volumes. Oracle databases with LVM volumes can use
the RMMs default replication and sync mechanisms. If a database spans multiple logical volumes an
additional configuration may be necessary to allow the RMM to make Oracle API calls to flush IOs prior
to taking volume snapshots.
The RMM can however facilitate replicating an Oracle DB with ASM disks by replicating the non-ASM
server portions to the target environment. The Selective Sync feature in the RMM allows this to
happen. The Target server boots with the exact same OS and Oracle configuration. After this step the
ASM disks can be added. This can save days of work setting up the target server. Once the server is
replicated the ASM disks can be added and then the data seeded with an Oracle DB specific mechanism
such as RMAN or Data Guard.
The RMM does not directly configure or monitor Data Guard (for ASM or non-ASM configurations).
However, the RMM is very effective at facilitating DataGuard setup. The RMM can replicate the origin
server to the target environment. The Target server boots with the exact same OS and Oracle
configuration. This can save days of work setting up the target server prior to the DG setup. Once the
server is replicated DG can commence. The customer is responsible for configuring DG as the RMM
does not support this. The RMM also does not integrate DG alerts or status in its dashboard or policy
alerts. However, often DG failover is initiated from a pre/post script that is executed as part of the
policy, but this is a customer preference.
The RMM supports Microsoft SQL Server with its default replication and sync mechanism. In fact, SQL
Server is very well behaved in conjunction with VSS which the RMM uses.
The RMM also supports replicating and configuring MS SQL servers configured with Always ON. The
context here is that the Always ON configuration is contained in the Origin environment and that same
Always On configuration needs to be replicated to the Target environment. The RMM will not configure
Always ON across a WAN with the redundant server in the DR site and the production server in the
origin site.
The default option is for the RMM to replicated both the primary and redundant MS SQL server to the
target environment as is. All the servers in the Always ON configuration are defined in the same Wave,
and defined to be sync'ed at the same time so the primary and redundant node are sync'ed as close as
possible in the target environment. HA configurations, including Always ON usually requires a postscript
to adjust the configuration in the Target for proper operation if the servers as part of a failover over
operation. For example, if IP addresses change in the Target, the Always ON configuration needs to
have the IP addresses change in the configuration. The configuration changes are usually small and
easily done in a postscript that can be executed automatically as part of the failover operation.
One or both servers can be Dynamically Provisioned. If only one server is pre-provisioned and one
server is dynamically provisioned, the primary server is the one that should be pre-provisioned.
The standard, default, sync engine is called RWSync. The primary advantages of RWSync are that it's
agentless, not sensitive to network outages, can handle massive updates at the same time, and includes
a final checksum of the data. The disadvantage is that it is slower than the TNG sync engine.
The TNG sync engine is designed for very large, high update rate servers or server with a very aggressive
RPO. The improvement with TNG requires the installation of a delta file tracker that must be whitelisted
in Antivirus software. While the TNG sync engine installation is automated, the standard
sync delta engine does not require any installation. The removal of the delta file tracker is also
automated, but must be explicitly specified via RMM command. For the standard delta sync engine
there is no such concern. The TNG sync engine does not support remote mount NFS/CIFS (but does
support NFS/CIFS servers that are Linux or Windows based).
For migrations, to minimize cutover times for larger servers, TNG is recommended unless there is an
aversion to installation of the file tracker. For DR, to minimize RPO times, TNG is recommended unless
the server has a low update rate or is small.
A note on network outage sensitivity for TNG. The file tracker is extremely efficient, but eventually,
normally day or weeks, TNG will cease tracking if it detects undue storage usage. Rest assured that the
syncs will still work as the RMM is automatically fallback to the standard Sync Engine for the first sync
after the network connection is restored. So in a network outage situation no intervention is required
for the server to be synced, but it should be noted that the first sync after the network is restored may
time much longer than a typical TNG sync. Eventually TNG will kick in and the sync return to the typical
RPO.
Note that even if Dynamic Provisioning is elected for a set of servers, it is still highly recommended that
during initial deployment the severs are completely replicated to Target Servers and thoroughly tested,
and bring up notes documented for the runbook. After that testing, the Target Servers can be deleted.
Both configurations can be used in the same deployment. Naturally, it is advantageous to configure as
many servers with Dynamic Provisioning as possible as that is the least costly option. But this must be
balanced with RTO considerations.
3.7 Clusters
The RMM supports clusters. Clusters are generally very hard as the decision tree is much larger and
more complicated than for non-cluster servers. For simple clusters (still far more complex than a single
server), the servers are replicated as is to a Target environment and only requires adjustment of IP
addresses. Adjustment of the IP addresses can be automated as needed which are the responsibility of
the customer. But often clusters can be complex and thus require additional attention.
Planning for Active Directory is an important aspect of any project and must be considered carefully.
The customer Active Directory team should be consulted as there are often nuances to configuration
and requirements that may need to be addressed. In general, there are two cases and the below
methods have proven effective in failing over Active Directory.
The first case is when the protected Active Directory server(s) are in a single datacenter and are not
protecting any servers beyond those in the DR scope.
The second case is when Active Directory is part of a Forest or when the DR Active Directory Controller is
required to service non-DR servers after a failover.
In this case the AD server(s) can be replicated and sync'ed to the DR site. Upon a failover the Domain
Controller is booted.
For DR sites the following has proven to be an effective Active Directory plan when there are multiple
sites besides the Origin protected environment, and a Forest is multi-site wide.
The AD server in the isolated DR bubble is for testing purposes only. The RMM can be used to replicate
an AD server(s) from the Origin environment for the testing functions.
The following diagram illustrates the process for a failover. In the failover operations Domain
Controller network and the DR bubble are merged. The Target servers log into the AD server in the DR
site. As that AD server is connected to the Forest (excluding the down site that is being recovered) AD
operations are restored and all nodes see the DR site as primary.
X
5. Invoke RMM Failover Domain
Controller Domain
Domain
(optionally power down Controller
Prod Clones RMM 1 Controller
Controller
Origin servers) Production RMM N Prod Clones
(replicated
only for testing)
6. Target servers login to
DC in DR site (may need Windows NAT’ed IP of Origin
to update DC IP address servers if duplicate IPs
via automated postscript)
Replicated
Windows
7. DR site now production
Linux & Windows
with access to all remote Linux
RackWare process
datacenters Replicated
Linux
The following 2 diagrams illustrate the process for a fallback operation. Essentially the process is
reversed.
It is a best practice to complete any Windows (or other application) software updates prior to
replication activities. The software update can dramatically lengthen the time it take to complete a
replication or sync operation.
It's typical after a replication/sync that Windows will require a reactivation. By default, the RMM does
not address licensing; however, this can be automated. Contact RackWare for more information.