Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

RACKWARE MANAGEMENT MODULE

TECHNICAL OVERVIEW
&
BEST PRACTICES

RACKWARE INC
VERSION 2.7
1 Overview ............................................................................................................................................... 3
2 Basic Replication ................................................................................................................................... 4
2.1 Discover/Examine ......................................................................................................................... 4
2.2 Capture for Store-and-Forward Approach .................................................................................... 4
2.3 Assign ............................................................................................................................................ 5
2.4 Direct Assign.................................................................................................................................. 7
2.5 Sync ............................................................................................................................................... 7
3 Replication Planning.............................................................................................................................. 8
3.1 Location and Number of RMMs .................................................................................................... 8
3.1.1 Location of the RMM Server ................................................................................................. 8
3.1.2 Number of RMM Servers ...................................................................................................... 8
3.2 Key Decisions Regarding RMM Configuration .............................................................................. 9
3.2.1 Storage .................................................................................................................................. 9
3.2.2 Networking............................................................................................................................ 9
3.3 Key Decisions Regarding Server and Infrastructure Configuration............................................... 9
3.3.1 IP Addresses and Hostnames ................................................................................................ 9
3.3.2 Isolate or Integrate the Network ........................................................................................ 10
3.3.3 Licensing - Application and OS ............................................................................................ 12
3.3.4 Anti-Virus Considerations ................................................................................................... 12
3.4 Special Considerations for Databases ......................................................................................... 12
3.4.1 Oracle Databases ................................................................................................................ 12
3.4.1.1 Oracle Databases with LVM Volumes ............................................................................. 12
3.4.1.2 Oracle Databases with ASM Disks................................................................................... 12
3.4.1.3 Oracle Data Guard (and Golden Gate) ............................................................................ 13
3.4.2 Microsoft SQL Server .......................................................................................................... 13
3.5 Clusters........................................................................................................................................ 13
3.6 Active Directory .......................................................................................................................... 15
3.7 Common Issues ........................................................................................................................... 18
3.7.1 Windows Software Update ................................................................................................. 20
3.7.2 Windows Reactivation ........................................................................................................ 20

RackWare Inc • Proprietary and Confidential Page 2 of 20


1 Overview

The RackWare management Module (RMM) provides 3 main use cases:


• Replication (or Migration)
• Converged Disaster Recovery & Backup
• Hybrid Cloud Management

All use cases rely on a common underpinning technology, a true any-to-any Image replication and sync
mechanism. Of course, a great deal of additional software surrounds the Image replication technology
to enable these use cases.

Virtually any confluence of server to server is supported. The RMM supports physical servers on both
the Origin side and Target side. Support for disparate hypervisors in the replication process is also a key
capability. For example, one can replicate from a Dell server to an HP server. Or from a VMware VM to
a Xen or KVM VM, even if in a Cloud. This is a crucial feature as many clouds are run on non-VMware
hypervisors, while many datacenters include VMware hypervisors as the primary virtualization vehicle.

RackWare technology also supports Virtual Machine (VM) back to physical, which is critical for Disaster
Recovery during a fallback operation, perhaps from cloud (Virtual) to a physical server in the datacenter.

The RMM is a Linux application and runs on a RedHat, CentOS, or Oracle Linux OS. The RMM does not
require any interface to a hypervisor or storage array. The RMM connects to the Origin server over the
network at the Operating System level and replicates the server Image. The Image can be replicated to
a storage location at the Target site or to a provisioned server (virtual or physical) at the Target site. The
RMM is also capable of auto-provisioning appropriately sized servers in the Target environment.

A Configuration Management Data Base (CMDB) is maintained by the RMM on the RMM Server. The
CMDB keeps track of resources the RMM is managing, optionally captured Images, as well as
operational state.

RackWare Inc • Proprietary and Confidential Page 3 of 20


2 Basic Replication

2.1 Discover/Examine

The Discover/Examine phase is relatively simple and straightforward. The user inputs the IP address or
DNS hostname of the server to be replicated and the RMM finds it and connects to it. The server is
called a Host in RackWare nomenclature (Host in this context does not refer to a hypervisor server).
Commands can be driven by CLI, GUI and API. The CLI command is called host discover.

The RMM connects to the Host via the supplied IP address or DNS hostname and gathers information
about the Host. The RMM uses SSH to connect to the server. So, a prerequisite is that the RMM
server's SSH key is installed on the Origin server. SSH is used for both Windows and Linux alleviating the
need for any password. If SSH is not already installed on Windows a tiny MSI is used to setup the SSH
software.

If the SSH keys and/or username/passwords are not coordinated correctly, the Discover process will fail,
and the administrator will be notified.

Once the Host is found, a series of inquiries are made to the Host using standard Operating System
queries. Information about the Host is logged in the RMMs CMDB. The discovered information,
metadata, will be used later in the replication process to AutoProvision a server (physical or virtual) at
the Target site.

Performing an explicit Discover/Examine prior to replication is optional. A capture command can be


submitted for a Host without having previously executed a Discover/Examine. Separating Discover from
Capture allows an administrator to verify that a Host is reachable and interrogate metadata parameters
before engaging the Capture process. He/she may wish to make adjustments to the running state
before capturing the Image.

2.2 Capture for Store-and-Forward Approach

The Capture process can be initiated on Hosts that have previously been discovered, or on Hosts that
the RMM has no awareness of. The capture process essentially creates a clone or snapshot of the Image
at the Target site. (If desired the Capture process can be skipped and Images can be replicated directly
to target machines.) As part of Capture, the RMM determines the necessary storage requirements,
which can be influenced by capture parameters, and allocates storage for the capture.

The RMM ensures data integrity in both the capture and eventually sync processes. The RMM takes a
snapshot of the Logical Volumes on the Origin server and copies the Image bits from the snapshot to the
storage location associated with the RMM Server at the Target site. (See the diagram in the next
section.) The snapshot is a standard OS mechanism whereby the OS informs the application to flush
their IOs to disk. When the IOs are flushed to disk, data for the filesystem is in a static and consistent
state. Next the OS places a bookmark in the filesystem and IOs then continue above the bookmark so
the process is non-disruptive to the Origin server. The RMM performs the copy (or delta sync) on the

RackWare Inc • Proprietary and Confidential Page 4 of 20


snapshot, and never on a live filesystem, to ensure data integrity of the captured and synced data.
Other than some additional IO's and network traffic, the Origin server is not affected.

This is a highly reliable process for any application using logical volumes and is OS compliant. It works
exceptionally well for databases including Microsoft SQL Server and Oracle configured with LVM
volumes. In fact, these databases are particularly reliable and well behaved resulting in clean Images on
the Target environment.

In addition to the Image bits, additional metadata about the Image is stored in the RMM's CMDB. The
metadata is vital to configuring Images in the Target environment during the final assign operation.
When an Image is assigned in the Target environment, the metadata drives configuration of crucial
elements necessary for the Image to operate correctly on the new hardware and in the new
environment.

Another important point about the capture process is that it is file based, and not block based. This
means that the RMM is smart enough to copy only the used data, not simply copying sectors from one
disk to another. This is a far more efficient process and has the added advantage that the Target storage
can be right sized if desired and permits other important features such as support for network storage.

2.3 Assign

To complete the replication process, an Assign operation is performed. In the Assign process the RMM
will first AutoProvision an appropriately sized server, although, optionally, the user can provision the VM
if desired. The RMM uses the metadata from the Discovery process to employ an algorithm to select a
server with the correct hardware profile Target site so the workload runs equal or very similar in
performance to the Origin server.

If the target server is an existing physical server or the user elected to provision the VM themselves, the
RMM can be pointed that IP address to it to complete the replication process.

The RMM connects to the Target server. For Linux the connection is always SSH. For Windows the
connection method can be SSH or winexe depending on the capabilities of the Cloud provisioning
mechanism. The Target server is examined to ensure it is capable of running the Origin Image. The
RMM also understand the underlying hardware so it knows how to configure the Origin Image to the
new hardware (e.g. - device drivers). Next the Target server is prepared to accept the Image of the
Origin server.

RackWare Inc • Proprietary and Confidential Page 5 of 20


Origin/Prod DR/Migrations Target/DR
Environment Environment

Server Server
Discover
RMM
Standard OS Linux & Windows use ssh
queries

Runtime OS Capture
CMDB
Live IOs Snapshot
Apps still running
Metadata

Format Disk
FileSystem Provision VM
bookmark Static Encrypted Xfer Data / FileSystem Create partitions
Snapshot From static snapshot Captured Xfer Data
Populate data
Syncs to Storage Image
(Flushed IOs)
Inject Drivers Configure Drivers
• Non-disruptive Config. Network
• Exact replica except Boot
– Drivers
– Optionally networking

The RMM boots the Target into a RackWare microkernel, and the disk is reformatted, recreating the
Logical Volume structure exactly as the Origin is configured. Importantly, the number of disks can be
different; the RMM employs a best-fit algorithm. Additionally, since the RMM replicates at the
file/filesystem level the storage on the Target, it does not have to be the exact size as the Origin; it
simply needs to be large enough to hold the used data. Optionally the user can specify the partition
map if desired.

Once the disk is formatted with identical partition structure of the Origin the RMM transfers the Image
bits to the Target. If necessary, drivers are injected to ensure that the Image can operate on the new
hardware, and optionally the network identify is modified to be consistent with the Target environment.

Upon a reboot, an exact replica of the Origin server is now running on the Target with the exception of
device drivers and optionally network configuration. The OS is the exact version as the Origin, all
applications, application data, user, packages, settings, and other data is identical.

The RMM performs a thorough verification of assigned server:


• Fully and properly boots
• Has working and correct networking
• Data has been replicated correctly
• SSH into the replicated server using the same credentials supplied for the origin system to
perform the replication

Note that for Windows, once the Origin Image has been applied to the Target server the RMM will only
use SSH to connect to it. It will not use winexe as was used to the newly provisioned server.

The above tests demonstrate that the origin was replicated to the target hardware with:
• Configuration of applicable systems and storage driver
• Maintaining user and password configuration from the origin
• Successful booting of the server
• Configuration and operation of standard network connectivity of the running server

RackWare Inc • Proprietary and Confidential Page 6 of 20


2.4 Direct Assign

There are many advantages to performing a store and forward replication. However, in circumstances
where those advantages are not relevant, the RMM can perform a replication directly from the Origin
server the Target server. This capability is called Flex Sync or Host Sync; these terms are used
interchangeable.

With Host Sync, the RMM performs all the same functions on the origin server from a discover
perspective. After Discover/Examine, the RMM provisions a target server if necessary or an existing
server can be used (e.g. - an existing physical server). Once the Target server is prepared, instead of
Capturing the Image on the RMM Server, the Image is replicated directly to the Target server. The
network connection can still be through the RMM avoiding a direct network connection between the 2
servers. This can be thought of as a Sync with provisioning or a combined Capture/Assign.

Origin/Prod DR/Migrations Target/DR


Environment Environment

Server Server
Discover RMM
Standard OS Linux & Windows use ssh
queries

Runtime OS Capture
CMDB
Live IOs Snapshot
Apps still running
Metadata

Format Disk
FileSystem Provision VM
bookmark Static Encrypted Xfer Data / FileSystem Create Volumes
Snapshot
Logical From static snapshot
Populate data
Volumes
(Flushed IOs)
Inject Drivers Configure Drivers
• Non-disruptive
Config. Network
• Exact replica except Boot
– Drivers
– Optionally networking

2.5 Sync

Another key capability of the RMM is the ability to perform a delta sync from Origin to Target. There are
numerous options for sync including:
• Stage I
• Stage II
• RMM passthrough
• Direct
• Selective Sync
• Drive/directory mapping

Information about these options can be found in the Migration and DR Guide documents.

RackWare Inc • Proprietary and Confidential Page 7 of 20


3 Replication Planning
3.1 Location and Number of RMMs
3.1.1 Location of the RMM Server

Technically the RMM Server can be located in any geography or location as long as it has network
connectivity to both the Origin and Target servers with the appropriate ports opened and protocols
permitted (see the Prerequisites and Operational Requirements document).

Under most circumstances the RMM Server is best situated in the Target environment and is considered
a best practice. There are several reasons for this.

First, the Target environment is often a cloud, and it is easy and convenient to spin up a VM to run the
RMM Server.

Second, locality of data is important. Since a WAN link normally sits between the origin and target, with
a lower speed and/or greater distance (increasing latency), capturing the Image and having the data
closer to the Target server is an advantage.

The Assign process is more complicated than Capture and it is more efficient to have the Image data
closer to the Target server where the two are connected by a LAN.

Of course, each situation is different and other factors may warrant installing the RMM on the Origin
side. For example, if an Origin datacenter will be replicated to multiple Target locations, it may be
quicker and more efficient to install a single RMM Server on the Origin site.

There are also cases where installing the RMM Server at both locations is the better choice. For
example, where the Origin holds very large amounts of data, and the network connection is slower
speed or unstable, the RMM can be installed at both locations. The RMM installed at the Origin would
capture the Image locally, over the LAN, to some kind of portable storage. Once completed, the disks
can be physically transported to the Target and imported by the RMM Server at that location to
complete the replication process. The RMM Server at the Origin can be decommissioned, and the RMM
Server at the Target can still be used to perform sync operations over the network.

3.1.2 Number of RMM Servers

An advantage of RackWare is that licensing is on a server basis and as many RMM Servers can be spun
up and used concurrently as desired.

It may be convenient to deploy an RMM in each geography, on individual physical LAN segments, per
migration team, or per some logical organization structure. RMMs can be spun up to meet the required
parallelism for aggressive projects.

After the initial replication, some of the RMMs can be decommissioned. For the remaining functions
such as cut over Syncs, and DR policies, the remaining RMMs can be used function to perform
operations on replications even if not done by that RMM.

RackWare Inc • Proprietary and Confidential Page 8 of 20


3.2 Key Decisions Regarding RMM Configuration
3.2.1 Storage

The RMM uses storage on the RMM Server to perform store-and-forward replication operations. In
migration use cases, the storage required there is temporary and can be removed after replication is
completed. DR use cases usually use RMM based storage.

Adequate storage performance is necessary in order for the replication operations to complete in a
timely manner. Guidelines can be found the Prerequisites and Operational Requirements document.

3.2.2 Networking

When configuring RMM is it best practice to configure 1GbE (or higher) interfaces to local resources.

It is best practice to configure a minimum of 100Mbit WAN connections between the Origin and Target
environments. While 100Mbit is a best practice there are many cases where slower speed links are
acceptable. In all cases, a stable and consistent network is actually more important than bandwidth.

3.3 Key Decisions Regarding Server and Infrastructure Configuration


3.3.1 IP Addresses and Hostnames

Correct application operations require that IP addresses and/or hostnames be correct in the
environment they are running. Therefore decisions about IP addresses and hostnames impact the effort
required to test and verify applications in a target environment. Some applications are more sensitive
to IP addresses, and some applications are more sensitive to hostnames. If applications are more
sensitive to hostnames, then there is more freedom in making choices about IP addresses in the Target
environment. By default the RMM retains the hostname, and changes the IP addresses on the
replicated server in the Target; though both can be configured either way.

The RMM has tremendous flexibility in dealing with IP addresses. It can:


• Inherit network identify of the Target environment (default)
• Keep Origin network identify
• Inherit Target network identify/Keep Origin network identify on an interface basis
• Configure specific network identify

As a general rule, it's typically easier for applications to work properly in the Target environment if IP
addresses are kept the same as the Origin environment. However, this complicates the network
topology and configuration. Additionally, there are circumstances where servers in the Target
environment require access to servers in the Origin environment.

If Origin IP addresses are retained as part of the replication process it's imperative that the Target
network be isolated form the Origin network, lest duplicate IP addresses be introduced. See the section
on "Isolate or Integrate the Network". Duplicate IP addresses will confuse many infrastructure
components such as routers, load balancers, and Domain Controllers. Careful planning and systematic

RackWare Inc • Proprietary and Confidential Page 9 of 20


execution will mitigate these difficulties and ensure a smooth replication experience. Note that the
Target environment must support BYOIP (Bring Your Own IPs). Many cloud environments do not
support this or require special considerations when doing so. Contact your Service Provider for details.
In any event, RackWare can facilitate either topology.

A common configuration is to retain Origin IPs on one interface and changed IPs on a second interface.
This can happen when one interface is using a private IP and dedicated for application use, and a second
interface is used exclusively for management functions. When moving to a cloud environment, its useful
(or required) to change the management IP as the management network is different from that of the
Origin. At the same time, keeping the private IP on the interface used by applications eases or
eliminates any application issues as part of the replication process.

It's also possible to configure a specific network configuration for a captured Image, and that
configuration is used when completing the replication process.

The RMM has the ability to retain the hostname (the default) or the administrator can configure a
new/different hostname as part of the replication process.

3.3.2 Isolate or Integrate the Network

Many factors affect whether or not the networks should be isolated or integrated. It is a best practice to
isolate the DR site from the Origin site though it's acceptable to have some special servers connect to
both environments.

If IP addresses are required to be the same, it is commensurately required that the networks be isolated.
There are multiple ways to configure isolated networks and allow the same IPs. The below diagram
shows two methods, one provided by RackWare, and one provided by the user or service provider.

RackWare Inc • Proprietary and Confidential Page 10 of 20


RackWare Bridge provides IP Aliasing
Origin/Prod Site Target/DR Site
Subnet
IP Aliasing
192.168.1.0/24 RackWare Subnet
Bridge Server 192.168.1.0/24
[Alias IPs
presented to Origin Side Target Side
Network / RMM Server Target
Origin RMM] Firewall / WAN Firewall /
Router Router
(Bridge Client) Systems
Workloads 192.168.2.0/24

Cloud Firewall/Router Provides 1:1 NAT


Origin/Prod Site Target/DR Site
Subnet
192.168.1.0/24 IP Aliasing Subnet
192.168.1.0/24
Origin Side
Firewall / Router Target Side
Network
Origin [NAT’ed IPs presented to RMM] / WAN
Firewall / RMM Server Target
Workloads Router Systems
w.x.y.z

Cloud RackWare
Legend Infrastructure
Client VMs
Managed

If the IP addresses are different it's still advised to isolate the networks. With different IP addresses the
NAT function can still be useful but is not necessary.

In the event of a failover event, removing the isolation works in conjunction with other infrastructure
that must be modified to point to the Target servers. This is often as simple as updating URLs to point to
a different IP address or a redirect in a DNS server.

One of the major issues with not isolating the DR site is that servers replicated in the Target
environment can connect back to servers in the Origin environment. A significant risk is when a Target
server connects back to an Active Directory server in the Origin environment. This will confuse the AD
function as to which server is the production server and likely cause an outage. Hence, if the networks
are not isolated a multi-pronged approach should be employed to assure the Target server. The
following mechanisms have successfully been used to address integrated networks and assure Target
servers do not connect to the Origin AD server.

• Modify the hostname


• Disable the GW IP of the Target server
• Disable the DNS IP of the Target server

The RMM supports modifying the hostname as part of a replication. However, it is highly advised that
all 3 mechanisms be employed. Modifying the hostname requires human intervention and in a large
project it's a matter of time before someone forgets to specify that options or does not update the
hostname correctly. If that happens additional layers will catch the Target server from connecting to the
Origin.

It is considered best practice to isolate the networks although it's noted that many circumstances
require exceptions. Whether or not a decision is made to isolate or integrate the network, taking care

RackWare Inc • Proprietary and Confidential Page 11 of 20


to ensure integrity of IP addresses, Domain names, and infrastructure sensitive to network configuration
will be rewarded during and after replication.

3.3.3 Licensing - Application and OS

By design, RackWare does not manipulate any licenses whether they be OS or application. A licensing
plan should be part of any replication activities.

The RMM performs a full Image replication and allows Images to be replicated to disparate hardware
and across disparate hypervisors. This means that the Image is then configured on different hardware
than was in the Origin environment.
Some licensing is sensitive to CPU configurations. While every effort is made in the RMM to provide
commensurate hardware (virtual or physical) it's possible that differences may arise requiring
adjustments

Likewise, some licensing is sensitive to hardware GUIDs or MAC addresses. In such cases these elements
may be different, especially MAC addresses.

3.3.4 Anti-Virus Considerations

RackWare works in conjunction with all anti-virus programs. However, the RackWare processes may
invoke protective mechanism by some anti-virus software. The RackWare processes usually completes,
but may take extremely longer. See the RackWare Prerequisites and Operational Requirements.

3.4 Special Considerations for Databases

In the majority of cases the default RackWare process will be sufficient for database replication,
migration and DR. However, databases often require special consideration given their potential size,
update rate, and possible distributed configuration.

3.4.1 Oracle Databases

3.4.1.1 Oracle Databases with LVM Volumes

The RMM supports Oracle databases with LVM volumes. Oracle databases with LVM volumes can use
the RMMs default replication and sync mechanisms. If a database spans multiple logical volumes an
additional configuration may be necessary to allow the RMM to make Oracle API calls to flush IOs prior
to taking volume snapshots.

3.4.1.2 Oracle Databases with ASM Disks

RackWare Inc • Proprietary and Confidential Page 12 of 20


The RMM does not support Oracle databases with ASM disks, though this is a roadmap feature. This
includes Oracle RAC as well as non-RAC.

The RMM can however facilitate replicating an Oracle DB with ASM disks by replicating the non-ASM
server portions to the target environment. The Selective Sync feature in the RMM allows this to
happen. The Target server boots with the exact same OS and Oracle configuration. After this step the
ASM disks can be added. This can save days of work setting up the target server. Once the server is
replicated the ASM disks can be added and then the data seeded with an Oracle DB specific mechanism
such as RMAN or Data Guard.

3.4.1.3 Oracle Data Guard (and Golden Gate)

The RMM does not directly configure or monitor Data Guard (for ASM or non-ASM configurations).
However, the RMM is very effective at facilitating DataGuard setup. The RMM can replicate the origin
server to the target environment. The Target server boots with the exact same OS and Oracle
configuration. This can save days of work setting up the target server prior to the DG setup. Once the
server is replicated DG can commence. The customer is responsible for configuring DG as the RMM
does not support this. The RMM also does not integrate DG alerts or status in its dashboard or policy
alerts. However, often DG failover is initiated from a pre/post script that is executed as part of the
policy, but this is a customer preference.

3.4.2 Microsoft SQL Server

The RMM supports Microsoft SQL Server with its default replication and sync mechanism. In fact, SQL
Server is very well behaved in conjunction with VSS which the RMM uses.

The RMM also supports replicating and configuring MS SQL servers configured with Always ON. The
context here is that the Always ON configuration is contained in the Origin environment and that same
Always On configuration needs to be replicated to the Target environment. The RMM will not configure
Always ON across a WAN with the redundant server in the DR site and the production server in the
origin site.

The default option is for the RMM to replicated both the primary and redundant MS SQL server to the
target environment as is. All the servers in the Always ON configuration are defined in the same Wave,
and defined to be sync'ed at the same time so the primary and redundant node are sync'ed as close as
possible in the target environment. HA configurations, including Always ON usually requires a postscript
to adjust the configuration in the Target for proper operation if the servers as part of a failover over
operation. For example, if IP addresses change in the Target, the Always ON configuration needs to
have the IP addresses change in the configuration. The configuration changes are usually small and
easily done in a postscript that can be executed automatically as part of the failover operation.

One or both servers can be Dynamically Provisioned. If only one server is pre-provisioned and one
server is dynamically provisioned, the primary server is the one that should be pre-provisioned.

3.5 Sync Engine Selection

RackWare Inc • Proprietary and Confidential Page 13 of 20


The RMM architecture is designed to support multiple sync engines. Different server needs such as size,
configuration, update rate, and network storage can be accommodated by different sync engines. The
RMM currently supports 2 engines. Importantly the vast majority of the software around the sync
engines is the same with only the designation of the sync engine as the difference. The major difference
in the sync engines is the efficiency of the delta calculation, logging, and verification of the sync'ed data.

The standard, default, sync engine is called RWSync. The primary advantages of RWSync are that it's
agentless, not sensitive to network outages, can handle massive updates at the same time, and includes
a final checksum of the data. The disadvantage is that it is slower than the TNG sync engine.

The TNG sync engine is designed for very large, high update rate servers or server with a very aggressive
RPO. The improvement with TNG requires the installation of a delta file tracker that must be whitelisted
in Antivirus software. While the TNG sync engine installation is automated, the standard
sync delta engine does not require any installation. The removal of the delta file tracker is also
automated, but must be explicitly specified via RMM command. For the standard delta sync engine
there is no such concern. The TNG sync engine does not support remote mount NFS/CIFS (but does
support NFS/CIFS servers that are Linux or Windows based).

For migrations, to minimize cutover times for larger servers, TNG is recommended unless there is an
aversion to installation of the file tracker. For DR, to minimize RPO times, TNG is recommended unless
the server has a low update rate or is small.

A note on network outage sensitivity for TNG. The file tracker is extremely efficient, but eventually,
normally day or weeks, TNG will cease tracking if it detects undue storage usage. Rest assured that the
syncs will still work as the RMM is automatically fallback to the standard Sync Engine for the first sync
after the network connection is restored. So in a network outage situation no intervention is required
for the server to be synced, but it should be noted that the first sync after the network is restored may
time much longer than a typical TNG sync. Eventually TNG will kick in and the sync return to the typical
RPO.

3.6 Dynamically Provisioned versus Pre-provisioned DR Configurations


The RMM supports two basic configurations for DR deployments. The first is called Dynamicly
Provisioned. In Dynamic Provisioning the RMM captures the Origin image and keeps it in a storage
location in the Target site. A DR policy can be created against it and will run in an automated fashion.
Either Sync Engine, TNG or the default RWSync, can be configured. The primary advantage of Dynamic
Provisioning is that during steady state no compute resources are used, except, of course, a very small
amount for the RMM Server. For the protected server, only storage is used which is typically much
cheaper than compute resources. This is a low-cost solution. However, the primary disadvantage of this
configuration is that recovery times will be longer because there will be time involved to provision the
target server and complete the replication process.

Note that even if Dynamic Provisioning is elected for a set of servers, it is still highly recommended that
during initial deployment the severs are completely replicated to Target Servers and thoroughly tested,
and bring up notes documented for the runbook. After that testing, the Target Servers can be deleted.

RackWare Inc • Proprietary and Confidential Page 14 of 20


The second basic configuration option is to pre-provision the Target Servers. Unless using TNG syncs it is
recommended that the storage in the middle, the RMM Storage, still be used to hold the Image (in this
case the sync process will sync through the intermediate storage to the Target Server). The primary
advantage of this configuration is that recovery time is about the time it takes to reboot the server. The
primary disadvantage is cost, as compute resources need to be used.

Both configurations can be used in the same deployment. Naturally, it is advantageous to configure as
many servers with Dynamic Provisioning as possible as that is the least costly option. But this must be
balanced with RTO considerations.

3.7 Clusters
The RMM supports clusters. Clusters are generally very hard as the decision tree is much larger and
more complicated than for non-cluster servers. For simple clusters (still far more complex than a single
server), the servers are replicated as is to a Target environment and only requires adjustment of IP
addresses. Adjustment of the IP addresses can be automated as needed which are the responsibility of
the customer. But often clusters can be complex and thus require additional attention.

Some conditions that may require additional attention:


• In cases where drives change ownership frequently, a mechanism may need to be put in place to
hold the drive in place during replication and sync operations.
• It's common to replicate/sync from a multi-node cluster to a single node cluster for economic
reasons. When doing so a post API script can be used to adjust the configuration on the Target
side after sync operations.

3.8 Active Directory and DNS

Planning for Active Directory is an important aspect of any project and must be considered carefully.
The customer Active Directory team should be consulted as there are often nuances to configuration
and requirements that may need to be addressed. In general, there are two cases and the below
methods have proven effective in failing over Active Directory.

The first case is when the protected Active Directory server(s) are in a single datacenter and are not
protecting any servers beyond those in the DR scope.

The second case is when Active Directory is part of a Forest or when the DR Active Directory Controller is
required to service non-DR servers after a failover.

3.8.1 Active Directory Configuration for Single Site

In this case the AD server(s) can be replicated and sync'ed to the DR site. Upon a failover the Domain
Controller is booted.

RackWare Inc • Proprietary and Confidential Page 15 of 20


Single Environment AD Server
Origin Site DR Site
WAN Isolated DR Network
• The Domain Domain
Domain
Controller
Connection Domain
Controller(s) are Controller
Prod Clones
Optional RMM 1 Controller
Domain
replicated and sync’ed Production AD replication RMM N Prod Clones
Controller
(for testing only
along with other DR
servers Windows Windows
Replication & Sync
• Domain Controllers Replicated
should be pre- Windows
provisioned Linux Linux
Replication & Sync
Replicated
Linux

3.8.2 Active Directory Configuration for Multi-Site Forest

For DR sites the following has proven to be an effective Active Directory plan when there are multiple
sites besides the Origin protected environment, and a Forest is multi-site wide.

Optional AD configuration if multi-site forest


Steady State during sync
Origin Site DR Site
• 2 DR sites Corporate WAN
- Domain Controllers Network
Connected to Production
- All other serves Domain
Other external Domain Controller forest Domain
Controller
• DR Domain Controllers Datacenters and other sites Controller
Prod Clones
New Build
has access to all
datacenters (the forest) DR Domain Controllers
• Origin site has access to Isolated from DR Network
DR Domain Controllers
WAN
• Origin site has access to Isolated DR Network
Domain Connection
other corporate sites Domain
Controller Domain
Domain
Controller Optional Controller
• In DR site only the RMM Prod Clones
Production AD replication
RMM 1 Controller
Prod Clones
(replicated
RMM N
has access to Origin (for testing only only for testing)
servers
• DC in Isolated DR site is
Windows Windows
Replication & Sync
only for testing
Replicated
Windows
Linux Linux
Replication & Sync
Replicated
Linux

RackWare Inc • Proprietary and Confidential Page 16 of 20


In the above diagram, there is an Active Directory server(s) in the Target site but isolated from the
Target servers. This AD server(s) is connected to other sites and a member of the Forest. It receives
updates from those AD servers such as password changes.

The AD server in the isolated DR bubble is for testing purposes only. The RMM can be used to replicate
an AD server(s) from the Origin environment for the testing functions.

The following diagram illustrates the process for a failover. In the failover operations Domain
Controller network and the DR bubble are merged. The Target servers log into the AD server in the DR
site. As that AD server is connected to the Forest (excluding the down site that is being recovered) AD
operations are restored and all nodes see the DR site as primary.

RackWare Inc • Proprietary and Confidential Page 17 of 20


Optional AD configuration if multi-site forest
Controlled Failover
1. Quiesce Apps at Origin
site Origin Site DR Site
(down) (Now production)
2. Isolate Origin (close Corporate WAN
connection to Corporate Network
WAN)
X Other external
Datacenters
Connected to Production
Domain Controller forest
Domain
Domain
Controller
Controller
3. Power down or delete Prod Clones
New Build
any testing testing
Domain Controllers in DR
site DR and Domain Controller
networks merged
4. Merge DR and Domain
Controller networks WAN (Target servers access DC)
Domain Connection

X
5. Invoke RMM Failover Domain
Controller Domain
Domain
(optionally power down Controller
Prod Clones RMM 1 Controller
Controller
Origin servers) Production RMM N Prod Clones
(replicated
only for testing)
6. Target servers login to
DC in DR site (may need Windows NAT’ed IP of Origin
to update DC IP address servers if duplicate IPs
via automated postscript)
Replicated
Windows
7. DR site now production
Linux & Windows
with access to all remote Linux
RackWare process
datacenters Replicated
Linux

The following 2 diagrams illustrate the process for a fallback operation. Essentially the process is
reversed.

Optional AD configuration if multi-site forest


Prep for Controlled Fallback
Origin Site DR Site
1. Quiesce Apps at DR (Quiesced)
Corporate WAN
site Network

2. UNmerge (isolate) X Other external


Datacenters
Connected to Production
Domain Controller forest
Domain
Domain
Controller
Controller
DR and Domain Prod Clones
New Build
Controller networks
Re-isolate DR Domain Controllers
3. Power on any Origin from DR site (after quiescing apps)
servers that were
powered down WAN
Domain Connection DR Network
4. Invoke RMM Fallback Domain
Controller
(reverse sync from Controller
Prod Clones RMM 1
Isolated again
Production
DR site back to RMM N
Origin)
Windows NAT’ed IP of Origin
servers if duplicate IPs
RMM Fallback executed Replicated
reverse sync to Origin Windows
Linux Linux & Windows
RackWare process
Replicated
Linux

RackWare Inc • Proprietary and Confidential Page 18 of 20


Optional AD configuration if multi-site forest
Complete Controlled Fallback
Enable WAN
Origin Site Connection DR Site
(Quiesced)
Corporate WAN
Network
Connected to Production Domain
1. Enable interface Other external Domain Controller forest Domain
Controller
to Corporate WAN Datacenters Controller
Prod Clones
New Build

2. Sync Domain DR Domain Controllers


Controllers in Isolated from DR Network
Origin site
WAN
3. Target servers Domain Connection Isolated DR
Domain
login to DC in Controller
Controller Network
Prod Clones RMM 1
Origin site (may Production RMM N
need to update
DC IP address) Windows NAT’ed IP of Origin
servers if duplicate IPs
4. Resume RMM
Syncs resumed from Replicated
Policies to return Origin to DR site Windows
to Steady State Linux Linux & Windows
RackWare process
Replicated
Linux

RackWare Inc • Proprietary and Confidential Page 19 of 20


3.9 Common Issues

3.9.1 Windows Software Update

It is a best practice to complete any Windows (or other application) software updates prior to
replication activities. The software update can dramatically lengthen the time it take to complete a
replication or sync operation.

3.9.2 Windows Reactivation

It's typical after a replication/sync that Windows will require a reactivation. By default, the RMM does
not address licensing; however, this can be automated. Contact RackWare for more information.

RackWare Inc • Proprietary and Confidential Page 20 of 20

You might also like