Professional Documents
Culture Documents
Server 2003 Cluster Installation
Server 2003 Cluster Installation
001
Windows Server 2003 Cluster Installation & DR Instructions ISSUE DATE:
04/03/06
DEPARTMENT: PREVIOUS DATE ISSUED:
Windows Service Team
Objective
This document will provide detailed instructions to complete the installation of a Microsoft
Windows Server 2003 two node cluster attached to shared disk. It will also tell you how to
manually failover a node as well as how to recover from a failed node or Quorum. The
information in this document applies to:
• Microsoft Clustered Server
• Windows Server 2003 Enterprise Edition
• TSM V5.3.0.3
Prerequisites
An understanding of the Auto Install process and EMC/shared disk. Knowledge of Virtual
Center if creating a cluster using virtual hardware.
Overview
This document will take you through the steps needed to install and configure a Windows
Server 2003 Cluster on physical or virtual hardware. Also included in this document are
instructions on how to create a cluster file share and how to manually failover an active node
to a non-active node in order to complete maintenance tasks such as applying patches or
installing new software or hardware. In the event of a single node, multiple node, or Quorum
drive (cluster database) crash, this document also provides instructions to restore those
components to a server with identical hardware. In the event of a Quorum failure, a short
cluster outage will occur during the restore. You will need administrator rights to do all of the
above.
Responsibilities
Windows Service Team
Table of Contents
Page 1 of 41
INTERNAL USE ONLY
Single Site..........................................................................................................................4
Multi-Site...........................................................................................................................5
RFID...................................................................................................................................5
Hardware Setup for VMware Clusters.......................................................................................5
Create & Install Virtual Servers.............................................................................................5
Add Components to New Servers..........................................................................................8
Add NIC #2........................................................................................................................8
Installation Overview...............................................................................................................13
Configuring the Heartbeat Network Adapter...........................................................................14
Setting up Shared Disks...........................................................................................................15
Configuring Shared Disks ..............................................................................................15
Assigning Drive Letters...................................................................................................15
Configuring the First Node......................................................................................................16
Configuring the Second Node..................................................................................................21
Configuring the Cluster Groups and Verifying Installation.....................................................24
Test failover to verify that the cluster is working properly..............................................25
Verify cluster network communications...........................................................................25
Besides the NetBios name and IP address for each of the servers (nodes) in the cluster, the
cluster itself needs a NetBios name and IP address. To make the installation less confusing,
Page 2 of 41
INTERNAL USE ONLY
rename the network connection on both servers to be used for the heartbeat network (the
non-teamed NIC) to Heartbeat.
Also, a domain user account is needed for the cluster. You need to request this account
from Computer Security by submitting the Global – Non-User ID Request form in Outlook.
Make sure that this account has logon rights to only the two servers in the cluster and that
the account password does not expire. Please note that this account cannot be requested
until the two servers are actually built. Also the password must be a secure one (minimum
of 14 characters etc) and it needs to be changed every 180 days. This account is to be used
for the cluster service only; no application should be using this ID. The owner should be
listed as Bryan Miller (B27021S).
Note: Multi-site clusters installed with an odd number (ex. USTCCA001) should have their
“A” node at TCC. Multi-site clusters installed with an even number (ex. USTCCA002) should
have their “A” node at TCC-West.
Multi-Site
• 2 Windows 2003 Enterprise Edition servers
• 3 Ethernet ports per server (2 for a teamed NIC and 1 for the heartbeat network)
• 2 single-port Emulex HBAs (or one dual-port HBA) per server
Page 3 of 41
INTERNAL USE ONLY
RFID
• 2 Proliant DL380 G4 servers with 4 - 3.4GHZ CPUs and 3.5GB RAM, installed with
Windows 2003 Server Enterprise Edition
• HP NC7771 NIC installed in slot 1
• 642 Smart Array controllers installed in slots 2 and 3 (part of the HP StorageWorks
Modular Smart Array 500 62 High Availability Kit)
Network Requirements
Single Site
• 1 Cluster IP and NetBios name
• 1 IP and NetBios name per server (For the teamed NIC)
• 1 IP per server for the heartbeat network, for clusters running at TCC you need to
request the heartbeat IP from Network Operations. Servers not running at TCC use
192.168.1.249 for node A and 192.168.1.250 for node B.
• 1 Crossover patch cord (Servers at TCC don’t need this)
• 2 Network ports on the production network per server
Multi-Site
• 1 Cluster IP and NetBIOS name. Request an IP address from Network Operations
using the IP request form and select “TCC/TCC West Spanned Vlan – 165.28.94” or
“TCC/TCC West Spanned Vlan – 165.28.111” form the “Site” drop down list.
• 1 IP and NetBIOS name per server (For the teamed NIC). Request an IP address
from NetOps using the IP request form and select “TCC/TCC West Spanned Vlan –
165.28.94” or “TCC/TCC West Spanned Vlan – 165.28.111” form the “Site” drop
down list.
• Note: All three IP’s for the cluster must be on the same Vlan.
• 1 IP per server for the heartbeat network. Request a heartbeat IP from Network
Operations using the IP request form and select “USTC- UNIX Heartbeat
172.12.123” form the “Site” drop down list.
• 2 Network ports on the production network per server.
RFID
• 1 Cluster IP and NetBIOS name
• 1 IP and NetBIOS name per server (For the teamed NIC)
• 1 IP per server for the heartbeat network, (use 192.168.1.249 for node A and
192.168.1.250 for node B)
• 1 Crossover patch cord
• 2 Network ports on the production network
Page 4 of 41
INTERNAL USE ONLY
Multi-Site
• 1 510MB Quorum disk minimum (must be a minimum of 500mb after formatting)
• Whatever disk configuration you need for the application you are running.
• The above disk needs to be seen by both servers.
• Note: When requesting disk inform the DSM team that this cluster is split
between TCC and TCC West.
RFID
• 1 HP StorageWorks Modular Smart Array 500 G2
• 1 HP StorageWorks Modular Smart Array 500 62 High Availability Kit
• 6 HP 72GB 10K Ultra320 SCSI HDD
• Configure 2 logical disks
o 1 510MB for the Quorum and the remainder of the disk as R: for SQL data.
NOTE: Both servers to be used in the cluster should be located on the same
ESX host.
Page 5 of 41
INTERNAL USE ONLY
4. Select Typical from the two options and then click Next.
5. Select the VM group where the new server will be located and click Next.
Page 6 of 41
INTERNAL USE ONLY
7. Enter the name of the new server (must be lower case) and choose the
datastore location for the c:\ drive. DO NOT choose vol_local. Click Next.
8. Choose vlan_04 as the NIC and vlance as the adapter type. Be sure to
check connect at power on as well. Click Next.
Page 7 of 41
INTERNAL USE ONLY
9. Select the disk size for the c:\ drive of the new server. Click Next.
10. Click Finish at the next window. The new virtual server should show up in
the correct VM group in about 10 seconds. Repeat this task for the second
server (node) which will be included in the cluster.
After both servers are created in Virtual Center, continue with the install of the
new servers using script builder and the auto-install process. Choose to
install just the c:\ drive from script builder and be sure to specify that it is an
ESX guest.
11. Once the new servers are built power them both off using Virtual Center.
NOTE: Both servers to be used in the cluster should be located on the same
ESX host.
Page 8 of 41
INTERNAL USE ONLY
13. Click Add at the window below. Click Next when the next window appears.
15. Choose heartbeat 1 for the NIC and be sure connect at power on is
checked. Click Finish.
Page 9 of 41
INTERNAL USE ONLY
16. Verify that Adapter Type for both NIC1 & NIC2 is set to vmxnet. Click OK.
21. Select the size and location of the disk you are creating, then click Next. This
disk should be located on the same datastore as the first disk you created
when building the server if at all possible. Also, this disk should be the
quorum disk, so it needs to be at least 500mb after formatting.
Page 10 of 41
INTERNAL USE ONLY
22. This new disk should be set on SCSI 1:0, then click Finish.
23. Repeat steps 19-23 for any other shared disks needed. Be sure to attach
them to the next available opening on SCSI Controller 1, not 0. (ex. SCSI 1:1)
24. Once all shared disks are created, click OK to exit the virtual machine
properties.
25. Re-open the virtual machine properties again and verify that the SCSI Bus
Sharing for SCSI Controller 1 is set to Virtual and that all shared disks were
created successfully. Click OK.
26. For Node B, repeat steps 19 & 20. Then select Use An Existing Virtual
Disk, then click Next.
Page 11 of 41
INTERNAL USE ONLY
27. Select the correct datastore where you created the shared disk on Node A,
then click Browse.
28. Choose the first shared disk file that was created with Node A and select
Open. You may need to review the properties for Node A to be sure you are
selecting the same disk file. Click Next at the next window that appears.
29. Be sure to add this new shared disk to the same SCSI controller and port that
it is using on Node A.
Page 12 of 41
INTERNAL USE ONLY
30. Repeat this process for the remaining shared disks that were created on
Node A. In the end, you want both nodes to have the exact same
configuration. See below for an example of two servers using the same
shared drives on the same host.
Once you have finished setting up both nodes you may power on Node A, but only Node A at
this time. Follow the steps below to setup the heartbeat NIC, the shared disks, and then the
first node in the cluster. Once you have done all those steps, you may power on Node B
and continue with adding it to the cluster.
Installation Overview
During the installation process, some nodes will be shut down and some nodes will be
rebooted. These steps are necessary to guarantee that the data on disks that are attached
to the shared storage bus is not lost or corrupted. This can happen when multiple nodes try
to simultaneously write to the same disk that is not yet protected by the cluster software.
Use Table 1 below to determine which nodes and storage devices should be powered on
during each step.
Page 13 of 41
INTERNAL USE ONLY
Setting up Shared On Off On Shutdown both nodes. Connect Fiber to the HBAs on
Disks both nodes, then power on node A.
Page 14 of 41
INTERNAL USE ONLY
NOTE: For SAN connected devices the Storage team must write the signatures
with the Diskpar command to set the Sectors Per Track to the proper offset.
After the disks and partitions have been configured, drive letters must be assigned to each
partition on each clustered disk. The Quorum disk (this disk holds the cluster information and
will usually be 1GB or less in size) should be assigned drive letter Q:.
1. Right-click the partition and select Change Drive Letter and Path.
2. Click Change and select a new drive letter, Click OK and then Yes.
Page 15 of 41
INTERNAL USE ONLY
3. Right-click the partition select Properties. Assign the disk label using the cluster
name (example: XXTCCA004-D) click OK.
4. Repeat steps 1 thru 3 for each shared disk.
5. When finished, the Computer Management window should look like Figure above.
Now close the Computer Management window.
6. Reboot Server.
3. At the “Open Connection to Cluster” screen select “Create New Cluster” from the
drop down menu. Then click OK.
Page 16 of 41
INTERNAL USE ONLY
4. At the “Welcome to The New Server Cluster Wizard” screen click Next.
5. At the “Cluster Name and Domain” screen select the domain the cluster will be in
from the drop down menu and enter the cluster name in the Cluster name field.
Then click Next.
6. At the “Select Computer” screen verify that the Computer name is that of node A.
Then click Next.
Page 17 of 41
INTERNAL USE ONLY
7. The “Analyzing Configuration” screen will appear and verify the server
configuration. If the setup is acceptable the next button will be available. Click Next
to continue. If not check the log and troubleshoot.
Page 18 of 41
INTERNAL USE ONLY
9. At the “Cluster Service Account” screen enter the cluster server account name and
password and verify that the domain is correct then click Next.
Page 19 of 41
INTERNAL USE ONLY
10. At the “Proposed Cluster Configuration” screen verify that all is correct. Take a
moment to click on the Quorum button and also verify that the Q:\ drive is selected
to be used as the Quorum disk. Click Next once configuration is verified.
11. At the “Creating Cluster” screen check for errors, if none click Next.
12. At the “Completing the New Server Cluster Wizard” screen click Finish.
Page 20 of 41
INTERNAL USE ONLY
Page 21 of 41
INTERNAL USE ONLY
6. The “Analyzing Configuration” screen will appear and verify the server
configuration. If the setup is acceptable the next button will be available. Click Next
to continue. If not, check the log and troubleshoot.
Page 22 of 41
INTERNAL USE ONLY
7. At the “Cluster Service Account” screen enter the cluster user account password
then click Next.
Page 23 of 41
INTERNAL USE ONLY
10. At the “Completing the Add Nodes Wizard” screen click finish.
Page 24 of 41
INTERNAL USE ONLY
2. Right click on the Cluster Group and then click rename and enter the name <Cluster
Name>.
3. Right click on the Disk Group # and then click rename and enter the name that
corresponds to the application you are installing on the cluster appended to the Cluster
Name I.E USTCCA001SQL1.
1. To test Failover Right-Click the Cluster Group and select Move Group. The
group and all its resources will be moved to node B. After a short period of time
the resources will be brought online on the node B. If you watch the screen, you
will see this shift.
2. Move the Cluster Group back to node A by repeating step 1.
1. You’ll need to verify the cluster network communications settings as a final step in
the cluster configuration. Click Start, then Administrative Tools, then Cluster
Administrator.
2. Right click on the cluster name, then choose Properties.
Page 25 of 41
INTERNAL USE ONLY
3. Click on the Network Priority tab from the window that appears.
4. Verify that the Heartbeat network is listed first and the Team network is listed
second.
5. Select the Heartbeat network from the list and click on Properties. Verify that
the Heartbeat network is set to Internal Cluster Communications Only (private
network) and then click OK.
6. Select the Team network from the list and click on Properties. Verify that the
Team network is set to All Communications (mixed network) and then click
Page 26 of 41
INTERNAL USE ONLY
OK.
Firefight – http://ws.kcc.com/AccountMgmt/
Example of an Asset Mgmt entry with the cluster name and two nodes listed separately.
When entering the cluster name into Asset Mgmt the following fields should be filled in as
follows:
Page 27 of 41
INTERNAL USE ONLY
In order to create a cluster file share you need to create the file share using the Cluster
Administrator GUI or the Cluster command line utility. The following steps will take you
through the process of creating a file share resources using the Cluster Administrator GUI.
4. The New Resource window will pop up. Fill in the following fields:
Name, Enter the Name of the Share
Description, Enter the path to the folder you are sharing
Resource Type, From the drop down list select File Share (note: the folder to
be shared must currently exist, if not, please create it at this point)
Page 28 of 41
INTERNAL USE ONLY
Click Next
5. In the Possible Owners windows make sure that all nodes of the cluster are listed in
the possible owners box then Click Next.
6. In the Dependences Windows Highlight the Disk resources in the resource window
that the Folder is on then click Add then Next.
Page 29 of 41
INTERNAL USE ONLY
8. The newly created share will be offline and you will need to bring it online. Right click
on the share you just created than click on Bring Online.
Page 30 of 41
INTERNAL USE ONLY
9. To set security on the share you right click on the share resource then select
Properties then click on the Parameters tab then click on the Permissions button
and add the security rights you need.
10. You can verify that the new share is online by accessing it from your desktop by
clicking Start/Run and then typing \\<cluster name>\<share name>
3. Under Groups click on the cluster group name (XXTCCA004 in the screen shot
example below). The right pane will display information about all of the resources in
the group. The Owner column will list the node that is currently running the cluster.
Make note of the owner as well as the state of the resources. In most cases, all of the
resources will be online and should be online again after you failover the cluster.
4. To failover the active node, under Groups right click on XXTCCA004 then click on
Move Group. If more than two nodes exist as part of the cluster you can choose a
specific node or choose Best Possible. If Best Possible is chosen, the group will
move to one of the available nodes based on preferred ownership designation for
that specific resource group.
Page 31 of 41
INTERNAL USE ONLY
5. You will notice the that state of the resources will go to offline pending, then to offline,
then to online pending, and finally to online. The owner will also switch at this time,
and this whole process will take about 10 to 15 seconds to complete. After it is
completed all the resources that were online before will now be online and the owner
will have changed.
Repeat steps 3-5 for all resource groups listed under the Groups heading.
Page 32 of 41
INTERNAL USE ONLY
1. These instructions are based on you restoring the server to identical hardware.
3. Make sure that the fiber cable to the SAN is disconnected. For VMWare clusters, make
sure the shared disks are removed from the server properties in Virtual Center before
installing the DR build.
5. Logon to the server with the following credentials: user name is administrator and
password is Admin1. These are case sensitive. Note: After the DR auto-install
completes, the server will probably auto-logon the first time with the administrator
account.
6. Make sure the Network Card is set to communicate at 100/Full (If the switch can handle
100/Full). In other words, make sure the Network card is configured the same as the
switch port. Do not leave it set to Auto/Auto.
8. Perform a search for the three files mentioned above. Take note of the paths to all
instances found. Disregard those paths leading to *.cab files. You will have to copy
these file back to those locations later.
Page 33 of 41
INTERNAL USE ONLY
10. Enable remote desktop connections on server if not already enabled (right click on
server name icon on desktop, Properties, Remote tab, Enable Remote Desktop on this
computer, OK)
11. Install correct service pack that was on the server before it crashed, most likely SP1.
Reboot server.
12. Log back into server. Check the three files from step 7, if newer files exist after service
pack is installed, then copy them to c:\temp and overwrite previously copied files.
13. Contact the Storage Management team to restore the required information.
6. From the “Action for Files That Already Exist”, Select “Replace” from the dropdown box.
Click the check box next to “Replace file even if read-only/locked” (You may still be
prompted to make a decision. Always select Replace).
7. Click OK.
8. Click Restore.
9. Verify the radio button for “Original location” is selected. Click Restore – If a message
asking to restart the machine pops up select No…. Always.
Page 34 of 41
INTERNAL USE ONLY
d) Click on the Restore tab and select always replace the file on my
computer. Click OK.
e) Click on the Restore tab and highlight file. Right-click and choose catalog
file…..
f) Enter the path to the catalog file (c:\sstate\systemstate.bkf) and hit OK.
g) Click on the “+” sign next to file.
h) Click OK on Popup window if it appears.
i) Click on the “+” sign next to System State.bkf created… that has the most
recent date/time
j) Click OK on Popup window if it appears.
k) Make sure the path still points to c:\sstate\systemstate.bkf and click OK.
l) Check the box next to system state.
m) Make sure the dropdown menu restore files to: Original Location is
selected. Click Start Restore.
n) A warning message stating “Restoring system state will always overwrite
current system state……..etc. Click OK.
o) A Confirm Restore message box will appear. Click OK.
p) Click OK on Popup window if it appears.
q) An Enter Backup File Name window may appear. Verify the path to be
c:\sstate\systemstate.bkf and click OK.
r) When the restore completes, click Close.
s) You’ll be prompted to reboot. Click on “NO”.
t) Close the NT Backup window.
4) On startup, press F8 and then select “Safe Mode with Networking” from the menu.
7) Open Device manager. Under “Display Adapters” delete any/all defined. By “System
Device” if there is an “!” next to “Compaq …..System Management Controller”. Delete the
device. Close Device Manager and Reboot.
10) Reapply the service pack version of the “Crashed Server”, Restart.
Page 35 of 41
INTERNAL USE ONLY
15) Reconnect the SAN fiber cable to the HBA. (Physical server only, not VMWare)
17) If the server is VMWare, add the shared disks back to the server properties thru Virtual
Center while the server is off. (This step is only for virtual servers.)
19) Verify that the cluster node is activated and functional. Items to check to be sure the
rebuild is successful:
1. These instructions are based on you restoring the server to identical hardware.
3. Make sure that the fiber cable to the SAN is disconnected. For VMWare clusters,
make sure the shared disks are removed from the server properties in Virtual Center
before installing the DR build.
5. Logon to the server with the following credentials: user name is administrator and
password is Admin1. These are case sensitive. Note: After the DR auto-install
completes, the server will probably auto-logon the first time with the administrator
account.
6. Make sure the Network Card is set to communicate at 100/Full (If the switch can
handle 100/Full). In other words, make sure the Network card is configured the same
as the switch port. Do not leave it set to Auto/Auto.
Page 36 of 41
INTERNAL USE ONLY
8. Perform a search for the three files mentioned above. Take note of the paths to all
instances found. Disregard those paths leading to *.cab files. You will have to copy
these file back to those locations later.
10. Enable remote desktop connections on server if not already enabled (right click on
server name icon on desktop, Properties, Remote tab, Enable Remote Desktop on
this computer, OK)
11. Install correct service pack that was on the server before it crashed, most likely SP1.
Reboot server.
12. Log back into server. Check the three files from step 7, if newer files exist after
service pack is installed, then copy them to c:\temp and overwrite previously copied
files.
13. Contact the Storage Management team to restore the required information.
6. From the “Action for Files That Already Exist”, Select “Replace” from the dropdown
box. Click the check box next to “Replace file even if read-only/locked” (You may
still be prompted to make a decision. Always select Replace).
7. Click OK.
8. Click Restore.
9. Verify the radio button for “Original location” is selected. Click Restore – If a
message asking to restart the machine pops up select No…. Always.
Page 37 of 41
INTERNAL USE ONLY
4. On startup, press F8 and then select “Safe Mode with Networking” from the menu.
7. Open Device manager. Under “Display Adapters” delete any/all defined. By “System
Device” if there is an “!” next to “Compaq …..System Management Controller”. Delete the
device. Close Device Manager and Reboot.
10. Reapply the service pack version of the “Crashed Server”, Restart.
15. Reconnect the SAN fiber cable to the HBA. (Physical server only, not VMWare)
17. If the server is VMWare, add the shared disks back to the server properties thru Virtual
Center while the server is off. (This step is only for virtual servers.)
19. Verify that the cluster node is activated and functional. Items to check to be sure the
rebuild is successful:
20) Restore second node by same process as first node, using all steps above.
21) Verify that the second cluster node is active and functional.
22) Schedule outage to test cluster failover by moving cluster groups from primary to
secondary node and then back again.
1. Logon to the primary node and reconfigure the Quorum drive using Disk Manager
(right click on computer name icon on desktop, Manage, Disk Management)
3. Logon to the primary node and open Backup (Start, All Programs, Accessories,
System Tools, Backup)
Page 39 of 41
INTERNAL USE ONLY
6. Double-click on the SystemState.bkf file on the right hand pane with the most recent
date/time stamp.
7. Now put a check next to “System State” in the expanded view to the left.
11. Put a check next to “Restore the Cluster Registry to the quorum disk…” and then
click “OK”. Click “Yes” at the next pop-up window, then “OK” again.
12. If asked to verify the location of the SystemState.bkf file, please browse to c:\sstate
and locate the file, then the restore should begin. This process will stop the cluster
service on the primary node and will restore the cluster configuration for that node.
13. When the restore has completed, select YES to reboot the server. This will restart
the Cluster service on the primary node and will then stop the Cluster service on all
other nodes in the cluster. Backup will then copy the restored cluster database
information from the restarted primary node to the Quorum disk (Q:\) and all other
nodes in the cluster.
14. Once the primary node is restarted, logon and verify that the Cluster service is
running. Also, try to ping the cluster name to be sure it is running properly.
15. Now go to each of the other nodes and start the cluster service manually. Verify the
cluster service is running after starting it on each node by doing the following:
16. Schedule outage to test failing cluster groups between nodes (primary node to
secondary node and back again)
Page 40 of 41
INTERNAL USE ONLY
Amendment Listing
This procedure has been prepared to comply with the requirements of Kimberly-Clark and
the Corporate Financial Instructions of Kimberly-Clark Corporation. It is important that no
deviation from the procedure occurs without prior reference to: Windows Services
Page 41 of 41