Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Failover Clustering & Hyper-V:

Multisite Disaster Recovery

Prakash Gopinadham
Support Escalation Engineer
Microsoft Corporation
Multi-Site Clustering Content
Design guide:
http://technet.microsoft.com/en-us/library/dd197430.aspx
Deployment guide/checklist:
http://technet.microsoft.com/en-us/library/dd197546.aspx
Customer case studies using multi-site clustering:
http://blogs.msdn.com/b/clustering/archive/2009/11/04/9917628.aspx
Multi-Site Clustering

Introduction
Networking
Storage
Quorum
Defining High-Availability
But what if there is a
catastrophic event and you lose
the entire datacenter?

Site A
Defining Disaster Recovery

Node is located
at a physically
separate site

Site A
Site B

SAN

Site A Site B
Benefits of a Multi-Site Cluster
Protects against loss of an entire location
Power Outage, Fires, Hurricanes, Floods, Earthquakes, Terrorism

Automates failover
Reduced downtime
Lower complexity disaster recovery plan

What is the primary reason why DR solutions fail?

Dependence on People
Multi-Site Clustering

Introduction
Networking
Storage
Quorum
Stretching the Network
Longer distance traditionally means greater network latency
Missed inter-node health checks can cause false failover
Cluster heartbeating is fully configurable

SameSubnetDelay (default = 1 second)


Frequency heartbeats are sent
SameSubnetThreshold (default = 5 heartbeats)
Missed heartbeats before an interface is considered down
CrossSubnetDelay (default = 1 second)
Frequency heartbeats are sent to nodes on dissimilar subnets
CrossSubnetThreshold (default = 5 heartbeats)
Missed heartbeats before an interface is considered down to nodes on
dissimilar subnets
Command Line: Cluster.exe /prop
PowerShell (R2): Get-Cluster | fl *
Security over the WAN

Encrypt inter-node communication


Trade-off security versus performance
0 = clear text
1 = signed (default)
2 = encrypted

10.10.10.1 20.20.20.1

30.30.30.1 40.40.40.1

Site A Site B
Network Considerations
Network Deployment Options:
1. Stretch VLANs across sites
2. Cluster nodes can reside in different subnets

Public
Network

10.10.10.1 20.20.20.1

Site A Site B
30.30.30.1 40.40.40.1

Redundant
Network
DNS Considerations
Nodes in dissimilar subnets
VM obtains new IP address
Clients need that new IP Address from DNS to reconnect

DNS Server 1 DNS Replication DNS Server 2

Record Created Record Obtained


Record Updated

Record Updated

10.10.10.111 20.20.20.222
Site A Site B

VM
VM == 20.20.20.222
10.10.10.111
Faster Failover for Multi-Subnet Clusters
RegisterAllProvidersIP (default = 0 for FALSE)
Determines if all IP Addresses for a Network Name will be registered by DNS
TRUE (1): IP Addresses can be online or offline and will still be registered
Ensure application is set to try all IP Addresses, so clients can come online
quicker

HostRecordTTL (default = 1200 seconds)


Controls time the DNS record lives on client for a cluster network name
Shorter TTL: DNS records for clients updated sooner
Exchange Server 2007 recommends a value of five minutes (300 seconds)
Solution #1: Local Failover First
Configure local failover fist for high availability
No change in IP addresses
No DNS replication issues
No data going over the WAN
Cross-site failover for disaster recovery
DNS Server 1

10.10.10.111 20.20.20.222

Site A Site B

10.10.10.111
Solution #2: Stretch VLANs

Deploying a VLAN minimizes client reconnection times


IP of the VM never changes

DNS Server 1 DNS Server 2

10.10.10.111

VLAN

FS = 10.10.10.111
Site A Site B
Solution #3: Abstraction in Networking
Device
Networking device uses independent 3rd IP Address
3rd IP Address is registered in DNS & used by client

30.30.30.30 DNS Server 2

DNS Server 1

10.10.10.111 20.20.20.222

Site A Site B VM = 30.30.30.30


Multi-Site Clustering

Introduction
Networking
Storage
Quorum
Storage in Multi-Site Clusters

Different than local clusters:


Multiple storage arrays independent per site
Nodes commonly access own site storage
No true shared disk visible to all nodes

Site A
Site BSite B

Site A Site B
SAN
Storage Considerations

Site A
Site A Site BSite B

Site A Site B
SAN

Replica

Changes are made on Site A and


replicated to Site B

DR requires data replication mechanism between


sites
Replication Partners

Hardware storage-based replication


Block-level replication

Software host-based replication


File-level replication

Appliance replication
File-level replication
Synchronous Replication

Host receives write complete response from the storage after the
data is successfully written on both storage devices

Replication

Write
Request

Write
Complete
Secondary
Storage
Site A Site B
Primary
Storage

Acknowledgement
Asynchronous Replication
Host receives write complete response from the storage after the
data is successfully written to just the primary storage device, then
replication

Replication

Write
Request

Write
Complete

Site A Site B

Primary Secondary
Storage Storage
Synchronous versus Asynchronous
Synchronous Asynchronous
No data loss Potential data loss on hard
failures
Requires high Enough bandwidth to keep
bandwidth/low latency up with data replication
connection
Stretches over shorter Stretches over longer
distances distances
Write latencies impact No significant impact on
application performance application performance
Cluster Validation with Replicated
Storage
Multi-Site clusters are not required to pass
the Storage tests to be supported

Validation Guide and Policy


http://go.microsoft.com/fwlink/
?LinkID=119949
Challenges of Block Storage
Replication
Storage block level replication typically Uni-Directional
(per LUN)
Change blocks flow from source to remote
Possible to have different LUNs replicating in different directions
Storage cannot enforce block level collision resolution
Application must determine resolution, or be coordinated in some way
Applications today implement shared nothing model
Surfacing storage as R/W at multiple sites is only useful if
applications can handle a distributed access device
Few applications implement the necessary support
Obvious exception is Cluster Shared Volumes for Hyper-V
Multi-Site Clustering

Introduction
Networking
Storage
Quorum
Quorum Overview

Disk only (not recommended) Node majority


Node and Disk majority Node and File Share majority

Vote Vote Vote Vote Vote


Replicated Disk Witness

A witness is a tie breaker when nodes lose network connectivity


The witness disk must be a single decision maker, or problems can occur
Do not use a Disk Witness in multi-site clusters unless directed by vendor

Vote Vote Vote

?
Replicated
Storage
Node Majority
can I communicate
5 Node Cluster:
Can I communicate with
with majority of the Majority = 3 majority of the nodes in
nodes in the cluster? the cluster?
Yes, then Stay Up No, drop out of
Cluster Membership

Site A
Cross site
Site B
network
connectivity
broken!
Majority in
Primary Site
Node Majority
5 Node Cluster: Can I communicate
We are down!
Majority = 3 with majority of the
nodes in the cluster?
No, drop out of
Cluster Membership

Site A
Need to force
quorum
manually

Site A Site B
Disaster at Site 1

Majority in
Primary Site
Forcing Quorum

Forcing quorum is a way to manually override and start a node even


if the cluster does not have quorum
Important: understand why quorum was lost
Cluster starts in a special forced state
Once majority achieved, drops out of forced state

Command Line:
net start clussvc /fixquorum (or /fq)
PowerShell (R2):
Start-ClusterNode FixQuorum (or fq)
Multi-Site with File Share Witness
File Share
Site C (branch office) Witness

Complete resiliency and


automatic recovery from the
\\Foo\Share
loss of any 1 site

WAN

Site A Site B
Multi-Site with File Share
Witness Site C (branch office)
File Share
Can I communicate Witness
with majority of Can I communicate
the nodes (+FSW) with majority of the
in the cluster? nodes in the cluster?

Yes, then Stay Up \\Foo\Share

No (lock failed), drop out


Complete
of Cluster Membership
resiliency and WAN
automatic
recovery from
the loss of
connection
between sites

Site A Site B
File Share Witness (FSW)
Considerations
Simple Windows File Server
Single file server can serve as a witness for multiple
clusters
Each cluster requires its own share
Can be made highly available on a separate cluster
Recommended to be at 3rd separate site for DR
FSW cannot be on a node in the same cluster
FSW should not be in a VM running on the same
cluster
Quorum Model Recap

Even number of nodes


Node and File Highest availability solution has FSW in
Share Majority 3rd site

Odd number of nodes


Node Majority More nodes in primary site

Node and Disk Use as directed by vendor


Majority

No Majority: Not Recommended


Disk Only Use as directed by vendor
Session Summary

Multi-site Failover Clusters have many benefits


You can achieve high-availability and disaster recover in a single
solution using Windows Server Failover Clustering

Multi-site clusters have additional considerations:


Determine network topology across sites
Choose a storage replication solution
Plan quorum model & nodes
Failover Clustering Resources

Design for a Clustered Service or Application in a Multi-Site Failover Cluster


http://technet.microsoft.com/en-us/library/dd197430(WS.10).aspx

Checklist: Setting Up a Clustered Service or Application in a Multi-Site Failover Cluster


http://technet.microsoft.com/en-us/library/dd197546(WS.10).aspx

Cluster Information Portal:


http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx

Clustering Technical Resources:


http://www.microsoft.com/windowsserver2008/en/us/clustering-resources.aspx

Clustering Forum (2008):


http://forums.technet.microsoft.com/en-US/winserverClustering/threads/
http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/

R2 Cluster Features:
http://technet.microsoft.com/en-us/library/dd443539.aspx
Resources
Software Application Developers Infrastructure Professionals

http://msdn.microsoft.com/ http://technet.microsoft.com/

msdnindia @msdnindia technetindia @technetindia


2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in
the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft
must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any
information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

You might also like