Cluster Top 10

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Top 10 Clustering Dos and Don’ts

DBA Fundamentals Down Under VC


January 12, 2016

Allan Hirt
Microsoft High Availability MVP
Managing Partner, SQLHA LLC
Twitter: @SQLHA
E-mail: allan@sqlha.com
Web/blog: http://www.sqlha.com
I’m Coming to Australia

http://www.wardyit.com/product/allan-hirt-mission-critical-deep-dive/
10. DO Understand Your Solution and the Stack

• Implementing technology for technology’s sake rarely works


• You will need to answer these (and probably more) questions
 Can you articulate the differences between a clustered instance (FCI), an availability
group (AG), and a Windows Server Failover Cluster (WSFC)?
 Do you understand how they work, even at a basic level?
 Do you know where they fit into the big picture?
 Do you know how they behave during a failover and how it impacts the app/end
users?
 Do you know why you are implementing a particular HA feature?
 Do you know your requirements?
 Do you know some of the technical details you will need?
The Clustering Stack

• Need to think about clustered configurations as tightly coupled

Clustered Applications

WSFC

Operating System (Windows)

Hardware (network, storage, etc.)


9. DO Take Licensing Into Account

• Cost is always a factor for physical, virtual, and public cloud


 Public cloud example: Bring your own license? Rent one in the cloud?
• Editions
 Do not need Enterprise Edition of SQL Server for FCIs; 2 node support in Standard and BI
 Need EE for AGs through 2014; 2016 has limited scenario for AGs in Standard
 Windows Server Failover Clustering in Standard Edition
• Relevant SQL Server whitepapers/info here: https://www.microsoft.com/en-
us/server-cloud/products/sql-server/purchasing.aspx
• AGs – ALL active use needs a license. Readable replicas are not free.
• Always ask your licensing rep – never assume!
 Mistakes are expensive
8. DON’T Forget to Nail Down Admin Roles

• “Who’s on first?”
 SQL Server does not obfuscate the fact it’s using a WSFC for FCIs or AGs
• Who is responsible for administration of the clustered configuration?
 DBAs?
 Windows admins?
 Storage admins?
 Others?
• Need to understand nuances when there is overlap and be very clear
 Example: not failing over an AG using the WSFC when standalone instances are
involved
7. DON’T Assume Things Are Shared and/or Isolated

• Underlying WSFC is the only really common link


• Each FCI and AG gets its own set of resources and role in the WSFC
 AG: AG resource, and if Listener, Network Name and IP address(es)
 FCI: Network Name, IP address(es), SQL Server resources, storage (if applicable)
• For FCIs, traditional disks presented to the WSFC can only be used by one FCI
• One listener per AG, not per WSFC
• FCIs and AGs are not load balanced
 Owned and run on a single node of the WSFC
 Only slight exception: read only load balancing for SQL Server 2016
• Shared DLLs makes patching an adventure with multiple FCIs
6. DO Plan Virtualized WSFCs, FCIs, & AGs Carefully

• Non-DBAs
 “Don’t worry about in guest availability; we’ve got Live Mirgation/vMotion/DRS/etc.”
• Reality: some mission critical VMs will still need in-guest HA and D/R
 FCIs still viable (shared storage the biggest issue)
 AGs a natural fit
• Planning similar to a physical implementation
 In guest is the same
 Hypervisor another layer to take into account
• Use Anti-Affinity on VM nodes
5. DON’T Implement Clusters on Bad Networks

• Nodes need to be able to talk to each other to ensure WSFC is up and


running
• Need redundant network paths down to physical layer
 Not necessarily public/private, can be a NIC team
• Multi-subnet is not always a straightforward configuration
• IP addresses
 Nodes
 WSFC
 Each FCI
 Each AG Listener (if being configured)
• Not recommended to disable IPv6
4. DO Implement Quorum Correctly

• No quorum = no WSFC
• Must monitor status
• Understand thresholds … and don’t be stupid
• Quorum != disk
 DO NOT USE THE OLD DISK ONLY QUORUM MODEL … EVER
• Split brain matters, but is a man-made problem
• Use W2012 or higher – quorum greatly improved
 Dynamic quorum (W2012+), dynamic witness (W2012 R2+)
• Multi-site can be tricky
 Witness resource?
• File share in third location?
• Asymmetric disk or file share?
• Cloud witness? (W2016+)
Cloud Witness 1
Cloud Witness 2
3. DON’T Make Assumptions When It Comes to Storage

• Traditional “shared” disks for an FCI are LUNs presented to each node
 Only used by ONE FCI
 All user databases, backups on shared drives – NOTHING LOCAL
• AGs require NO shared storage unless FCI is in the mix
• Newer options:
 SMB 3.0 (SQL Server 2012+)
 Local TempDB (SQL Server 2012+)
 Clustered Shared Volume (SQL Server 2014)+
• SMB, CSV shared
 Folder structure now matters
2. DO Have a Process for AD Objects

• WSFC requires AD through Windows Server 2012 R2/SQL Server 2014 for all
clustered AG and FCI configurations
 Each node of a WSFC joined to same domain regardless of location
 Each name resource gets an object in AD
• Cluster Name Object (CNO) = WSFC
• Virtual Computer Object (VCO) = FCI Network Name, Listener Network Name
• Each name must be unique in the domain
 Domain-based account required for creation and administration
• Non-AD connected WSFCs
 2012 R2 allowed DNS only WSFCs, but SQL Server requires domain accounts
 Windows Server 2016/SQL Server 2016 support domain agnostic WSFCs for very
specific AG configurations only
1. DON’T Use Active/Passive, Active/Active, or AlwaysOn
– Improper Terminology Makes Kittens Sad

Source: http://mysterywallpaper.blogspot.com/2013/06/crying-kitten.html
Proper Terminology for Windows and AGs

• MSCS has not been valid for well over 10 years; the Windows portion is a
Windows Server Failover Cluster (WSFC)
 Saying only “cluster” is too generic
• AlwaysOn
 AlwaysOn IS NOT the AG feature
 Umbrella term that encompasses two major features (with or without space)
• AlwaysOn Availability Groups (AGs – it’s not AOAG or AAG, either)
• AlwaysOn Failover Cluster Instances (FCI) – same as existing feature
 No space between Always and On (well …)
 Not the same as the old Always On (with space) storage program for SQL Server (or
how it was used in 2008)
• AGs had a few different names before RTM (HADRON/HADR/AlwaysOn)
Proper Terminology for FCIs

• Active/Passive and Active/Active are WRONG


 Single instance failover cluster = 1 instance
 Multiple instance failover cluster = > 1 instance
• Example 1 – 8 nodes, 12 FCIs
 Is it A/A/A/A/A/A/A/A/A/A/A/A?(# of FCIs) – NO
 Is it A/A/A/A/A/A/A/A? (# of nodes) – NO
 Is it A/A? – HECK NO
 Is it a multiple instance failover cluster with 12 FCIs? Yes!
• Example 2 – 4 nodes, 3 instances (N + 1 [N + i] scenario)
 Is it A/A/A? (# of FCIs) NO
 Is it A/A/A/P? (# of nodes) NO
 Is it A/A? MOST DEFINITELY NOT
Selected Links

• My AG FAQ http://www.sqlha.com/allans-alwayson-availability-groups-faq/
• Supportability for virtualized configurations
 MS support policy for virtualized SQL Server implementations (all hypervisors)
https://support.microsoft.com/en-us/kb/956893
 VMware supported cluster configurations
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displ
ayKC&externalId=1037959
• Proper Terminology
 http://sqlha.com/2012/01/09/once-more-with-feeling-stop-using-activepassive-and-
activeactive/
 http://sqlha.com/2013/04/29/alwayson-is-the-new-activepassive-and-activeactive/
 http://sqlha.com/2015/12/16/dear-microsoft-i-love-you-but-youre-driving-me-batty/

You might also like