Professional Documents
Culture Documents
Prakash Gopinadham Failover Clustering and HyperV
Prakash Gopinadham Failover Clustering and HyperV
Prakash Gopinadham
Support Escalation Engineer
Microsoft Corporation
Multi-Site Clustering Content
Design guide:
http://technet.microsoft.com/en-us/library/dd197430.aspx
Deployment guide/checklist:
http://technet.microsoft.com/en-us/library/dd197546.aspx
Customer case studies using multi-site clustering:
http://blogs.msdn.com/b/clustering/archive/2009/11/04/9917628.aspx
Multi-Site Clustering
Introduction
Networking
Storage
Quorum
Defining High-Availability
But what if there is a
catastrophic event and you lose
the entire datacenter?
Site A
Defining Disaster Recovery
Node is located
at a physically
separate site
Site A
Site B
SAN
Site A Site B
Benefits of a Multi-Site Cluster
Protects against loss of an entire location
Power Outage, Fires, Hurricanes, Floods, Earthquakes, Terrorism
Automates failover
Reduced downtime
Lower complexity disaster recovery plan
Dependence on People
Multi-Site Clustering
Introduction
Networking
Storage
Quorum
Stretching the Network
Longer distance traditionally means greater network latency
Missed inter-node health checks can cause false failover
Cluster heartbeating is fully configurable
10.10.10.1 20.20.20.1
30.30.30.1 40.40.40.1
Site A Site B
Network Considerations
Network Deployment Options:
1. Stretch VLANs across sites
2. Cluster nodes can reside in different subnets
Public
Network
10.10.10.1 20.20.20.1
Site A Site B
30.30.30.1 40.40.40.1
Redundant
Network
DNS Considerations
Nodes in dissimilar subnets
VM obtains new IP address
Clients need that new IP Address from DNS to reconnect
Record Updated
10.10.10.111 20.20.20.222
Site A Site B
VM
VM == 20.20.20.222
10.10.10.111
Faster Failover for Multi-Subnet Clusters
RegisterAllProvidersIP (default = 0 for FALSE)
Determines if all IP Addresses for a Network Name will be registered by DNS
TRUE (1): IP Addresses can be online or offline and will still be registered
Ensure application is set to try all IP Addresses, so clients can come online
quicker
10.10.10.111 20.20.20.222
Site A Site B
10.10.10.111
Solution #2: Stretch VLANs
10.10.10.111
VLAN
FS = 10.10.10.111
Site A Site B
Solution #3: Abstraction in Networking
Device
Networking device uses independent 3rd IP Address
3rd IP Address is registered in DNS & used by client
DNS Server 1
10.10.10.111 20.20.20.222
Introduction
Networking
Storage
Quorum
Storage in Multi-Site Clusters
Site A
Site BSite B
Site A Site B
SAN
Storage Considerations
Site A
Site A Site BSite B
Site A Site B
SAN
Replica
Appliance replication
File-level replication
Synchronous Replication
Host receives write complete response from the storage after the
data is successfully written on both storage devices
Replication
Write
Request
Write
Complete
Secondary
Storage
Site A Site B
Primary
Storage
Acknowledgement
Asynchronous Replication
Host receives write complete response from the storage after the
data is successfully written to just the primary storage device, then
replication
Replication
Write
Request
Write
Complete
Site A Site B
Primary Secondary
Storage Storage
Synchronous versus Asynchronous
Synchronous Asynchronous
No data loss Potential data loss on hard
failures
Requires high Enough bandwidth to keep
bandwidth/low latency up with data replication
connection
Stretches over shorter Stretches over longer
distances distances
Write latencies impact No significant impact on
application performance application performance
Cluster Validation with Replicated
Storage
Multi-Site clusters are not required to pass
the Storage tests to be supported
Introduction
Networking
Storage
Quorum
Quorum Overview
?
Replicated
Storage
Node Majority
can I communicate
5 Node Cluster:
Can I communicate with
with majority of the Majority = 3 majority of the nodes in
nodes in the cluster? the cluster?
Yes, then Stay Up No, drop out of
Cluster Membership
Site A
Cross site
Site B
network
connectivity
broken!
Majority in
Primary Site
Node Majority
5 Node Cluster: Can I communicate
We are down!
Majority = 3 with majority of the
nodes in the cluster?
No, drop out of
Cluster Membership
Site A
Need to force
quorum
manually
Site A Site B
Disaster at Site 1
Majority in
Primary Site
Forcing Quorum
Command Line:
net start clussvc /fixquorum (or /fq)
PowerShell (R2):
Start-ClusterNode FixQuorum (or fq)
Multi-Site with File Share Witness
File Share
Site C (branch office) Witness
WAN
Site A Site B
Multi-Site with File Share
Witness Site C (branch office)
File Share
Can I communicate Witness
with majority of Can I communicate
the nodes (+FSW) with majority of the
in the cluster? nodes in the cluster?
Site A Site B
File Share Witness (FSW)
Considerations
Simple Windows File Server
Single file server can serve as a witness for multiple
clusters
Each cluster requires its own share
Can be made highly available on a separate cluster
Recommended to be at 3rd separate site for DR
FSW cannot be on a node in the same cluster
FSW should not be in a VM running on the same
cluster
Quorum Model Recap
R2 Cluster Features:
http://technet.microsoft.com/en-us/library/dd443539.aspx
Resources
Software Application Developers Infrastructure Professionals
http://msdn.microsoft.com/ http://technet.microsoft.com/