Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 16

10.

Cluster Component Architecture

10. Cluster Component Architecture

This module provides an overview of cluster architecture and looks in more detail at the Cluster
service component architecture.

Prerequisites
1
Before starting this session, you should:
● Understand how to create and configure cluster resource types.
● Understand where to find all locations of the cluster registry and how these copies are all kept
in synch.

What You Will Learn


2
After completing this session, you will be able to:
● Describe the Cluster service architecture.
● List Cluster service components.
● Explain what is meant by “pushing a group” and “pulling a group.”

Introduction
3
Server clusters are designed as a separate, isolated set of components that work together with the
operating system. This design avoids introducing complex processing system schedule
dependencies between the Server clusters and the operating system. However, some changes in the
base operating system are required to enable cluster features. These changes include:
● Support for dynamic creation and deletion of network names and addresses.
● Modification of the file system to enable closing open files during disk drive dismounts.
● Modification of the Input/Output (I/O) subsystem to enable sharing disks and volume sets
among multiple nodes.
Apart from the above changes and other minor modifications, cluster capabilities are built on top
of the existing foundation of the Microsoft Windows Server 2003 operating system.
The core of server clusters is the Cluster service itself, which is composed of several functional
units. This section provides a detailed discussion of the following topics:
● Event processor
● Node manager
● Membership manager

Introduction to Windows Server 2003 Cluster Technologies 1


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

● Global update manager


● Database manager
● Log manager
● Resource manager
● Failover manager
● Pushing a group
● Pulling a group
● Communication manager

Cluster Architecture Overview


4
The Cluster service architecture presents cluster management in three tiers:
● Cluster abstractions
● Cluster operation
● Operating system interaction

Cluster Abstractions
The top tier provides the abstractions for nodes, resources, dependencies, and groups. The
important operations are resource management, which controls the local state of resources, and
failure management, which orchestrates responses to failure conditions.

Cluster Operation
The middle tier provides important distributed operations, such as membership and regroup
operations, and maintaining the cluster configuration. The shared registry allows the Cluster
service to see a globally consistent view of the cluster's current resource state. The cluster's registry
is updated with a global atomic update protocol and made persistent using transactional logging
techniques. The current cluster membership is recorded in the registry.

Windows Server 2003 Interaction and Drives


The bottom tier provides the integration with Windows Server 2003, cluster disk, and network
drivers. The Cluster service relies heavily on the Windows Server 2003 process and scheduling
control, RPC mechanisms, name management, network interface management, security, resource
controls, and file system. Cluster service is designed to integrate neatly into the Windows Server
2003 architecture, using standard, extensible interfaces such as Transport Driver Interface (TDI).
Windows Server 2003 has a layered driver and network architecture that readily lends itself to
extending functionality through additional drivers inserted into the I/O stacks. Cluster service
extends the basic operating system with two new kernel mode modules:
● The cluster disk, which implements the fast mutex exclusion protocol for shared disks.
● The cluster network module, which implements a simplified interface to intra-cluster
communication and implements the heartbeat monitoring.

2 Introduction to Windows Server 2003 Cluster Technologies


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Cluster Service Component Architecture


5
The core of Cluster service consists of a number of dependent modular components (see Table 1
below). They are combined in a single process, with some communication functionality delegated
to the cluster network driver.
Cluster management APIs and COM interfaces control interactions between the cluster
components and cluster resources as well as the interface for cluster management tools.1
Table 1. Cluster Service Components

Component Functionality
Event Processor Provides intra-component event delivery service.
Object Manager A simple object management system for the object collections in the
Cluster service.
Node Manager Controls the quorum Form and Join process, generates node failure
notifications, and manages network and node objects.
Membership Manager Handles the dynamic cluster membership changes.
Global Update Manager A distributed atomic update service for the volatile global cluster state
variables.
Database Manager Implements the Cluster Configuration Database.
Checkpoint Manager Provides persistent storage of the current state of a cluster (its registry
entries) on the quorum resource.
Transaction Manager Ensures that multiple related changes to the cluster database on a node
are performed as an atomic local transaction. Provides primitives for
beginning and ending a transaction.
Log Manager Provides structured logging to persistent storage and a lightweight
transaction mechanism.
Resource Manager Controls the configuration and state of resources and resource
dependency trees. It is responsible for monitoring active resources to see
if they are still online.
Failover Manager Controls the placement of resource groups at cluster nodes. Responds to
configuration changes and failure notifications by migrating resource
groups.
Communication Manager Provides inter-node communication among cluster members.
Event Log Replication Replicates event log entries from one node to all other nodes in the
Manager cluster.
Backup/Restore Manager Backs up or restores quorum log file and all checkpoint files, with help
from the Failover Manager and the Database Manager.

The components run in the process CLUSSVC.EXE, stored in %systemroot%\cluster.

Introduction to Windows Server 2003 Cluster Technologies 3


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Event Processor
6
The Event Processor is the communications center of the Cluster service. It is responsible for
connecting events to applications and other components of the Cluster service. This includes:
● Application requests to open, close, or enumerate cluster objects.
● Delivery of signal events to cluster-aware applications and other components of the Cluster
service.
The Event Processor is also responsible for starting the Cluster service and bringing the node to the
“online” state. The Event Processor then calls the Node Manager to begin the process of joining or
forming a cluster.

Cluster Service States


From the point of view of other nodes in the cluster and management interfaces, nodes in the
cluster may be in one of three distinct states. These states are visible to other nodes in the cluster,
are really the state of the Cluster service, and are managed by the Event Processor.
● Offline - The node is not a fully active member of the cluster. The node cannot be contacted,
so it is considered inactive. Its Cluster service may or may not be running.
● Online - The system is a fully active member of the cluster. It honors cluster database
updates, contributes votes to the quorum algorithm, maintains heartbeats, and can own and
run groups.
● Paused - The system is an active member of the cluster. It honors cluster database update,
participates in quorum arbitration and maintains heartbeats. The paused state is provided to
allow certain maintenance to be performed. Online and Paused are treated as equivalent states
by most of the cluster software.
A paused node can own groups. Any resources that were active when the node was paused remain
active and can be taken offline. Resources will be restarted on failure, up to the threshold settings.
However, resources and groups that are offline when the node is paused cannot be brought online
while the node is in the paused state. Resources and groups can be created, including the addition
of entries to the local and quorum copies of the cluster database, but they cannot be brought online.
A node can join a cluster managed by a paused node. In this event, any online groups will move to
the joining node from the paused node.

Node Manager
7
The Node Manager keeps track of the status of other nodes in the cluster. There are two types of
nodes in the cluster at any time:
● Defined nodes are all possible nodes that can be cluster members.
● Active nodes are the current cluster members.
The Node Manager is notified of any node failures suspected by the cluster network driver
(ClusNet). This failure suspicion is generated by a succession of missed “heartbeat” messages
between the nodes. Five missed heartbeats generate a failure suspicion, causing the Node Manager
to initiate a regroup to occur to confirm the cluster membership, as will be described shortly.

4 Introduction to Windows Server 2003 Cluster Technologies


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Remember that a node is in one of three states: Offline, Online, or Paused. If a node does not
respond during regroup, the remaining nodes consider it to be in an offline state. This is indicated
graphically in Cluster Administrator. The node is still defined but is no longer active. If a node is
considered to be offline, its active resources must be failed over (“pulled”) to the node that is still
running. The Resource/Failover Manager does the failover portion of this process.
Once the Node Manager has determined that the other node is offline, it stops the heartbeat
messages being sent out by ClusNet and will not listen for heartbeats from the other node. When
the other node is restarted, it will go through the join process and the cluster heartbeats begin
again.

Startup Operations
The Node Manager controls the join and form processes when the Cluster service starts on a node.
It processes the node, network, and network information from the local cluster database. It brings
the defined networks online and then registers the network interfaces with the cluster network
transport, so that the discovery process can search for other nodes. The Node Manager initializes
the regroup process to determine cluster membership and checks Cluster service version
compatibility when a node attempts to join.

Quorum Resource Arbitration


The Node Manager is responsible for checking ownership of the quorum resource during
arbitration and taking the node offline if arbitration fails. The cluster disk driver makes the SCSI
reserve attempt at the request of the Failover Manager/Resource Manager, but it is the Node
Manager that controls the node’s state based on the arbitration result.
If communication between nodes fails, the nodes that have control of the quorum resource will
bring all resources online. The nodes that cannot access the quorum resource and are out of
communication will shut down their Cluster service.

Membership Manager
8
If there are only two nodes in a cluster and one node fails, the remaining node constitutes the
whole cluster. The cluster “view” is therefore by definition consistent across the cluster. If there
are more than two, it is essential that all systems in the cluster always have exactly the same view
of cluster membership.
In the event that one system detects a communication failure with another cluster node, it
broadcasts a message to the entire cluster causing all members to verify their view of the current
cluster membership. This is called a regroup event and is handled by the Membership Manager.
The Node Manager freezes writes to potentially shared devices until the membership has
stabilized.
It maintains consensus among the active nodes: who is active and who is defined. There are two
important components to the membership management:
● The join mechanism, which admits new members into the cluster.
● The regroup mechanism, which determines current membership on startup or suspected
failure.

Introduction to Windows Server 2003 Cluster Technologies 5


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Member Join
When the Cluster service on a node starts, it will try to connect to an existing node in the cluster.
The joining node will petition this “sponsor” node to join the cluster. The sponsor will be the
remaining node in a two-node cluster or any active node if there are more than two defined nodes.
The join algorithm is controlled by the sponsor and has five distinct phases for each active node.
The sponsor starts the join algorithm by broadcasting the identity of the joining node to all active
nodes. It then informs the new node about the current membership and cluster database. This starts
the new member's heartbeats. The sponsor waits for the first heartbeat from the new member, and
then signals the other nodes to consider the new node a full member. The algorithm finishes with
an acknowledgement to the new member.
All the broadcasts are repeated RPCs to each active node. If there is a failure during the join
operation (detected by an RPC failure), the join is aborted and the new member is removed from
the membership.

Member Regroup
If there is suspicion that an active node has failed, the membership manager runs the regroup
protocol to detect membership changes. This suspicion can be caused by problems at the
communication level, resulting in missing heartbeat messages.
The regroup algorithm moves each node through six stages. Each node periodically sends a status
message in the form of a bit mask to all other nodes indicating which stage it has finished. None of
the nodes can move to the next stage until all nodes have finished the current stage.
● Activate. Each node waits for a local clock tick so that it knows that its timeout system is
working. After that, the node starts sending and collecting status messages. It advances to the
next stage if all active nodes have responded, or when the maximum waiting time has
elapsed.
● Closing. This stage determines whether partitions exist and whether the current node is in a
partition that should survive. Partitions are isolated groups of nodes within the same cluster.
They can arise because of a loss of network connections between some nodes but not others.
Any nodes that can see each other over the network are in the same partition.
● Pruning. All nodes that have been pruned for lack of connectivity halt in this phase. All
others move forward to the first cleanup phase.
● Cleanup Phase One. All surviving nodes install the new membership, mark the nodes that did
not survive the membership change as inactive, and inform the cluster network manager to
filter out messages from these nodes. Each node's Event Processor then invokes local callback
handlers to announce the node failures.
● Cleanup Phase Two. Once all members have indicated that the Cleanup Phase One has been
successfully executed, a second cleanup callback is invoked to allow a coordinated two-phase
cleanup. Once all members have signaled the completion of this last cleanup phase they move
to the final state.
● Stabilized. The regroup has finished.
There are several points during the operation where timeouts can occur. These timeouts cause the
regroup operation to restart at phase one.

6 Introduction to Windows Server 2003 Cluster Technologies


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Global Update Manager


9
Many components of the Cluster service need to share volatile global state among nodes. The
global update, one of the cluster’s key middle-tier operations, propagates global updates to all
nodes in the cluster. Through it, the cluster maintains the replicated cluster registry. All updates are
atomic and totally ordered, and will tolerate all benign failures, depending on cluster membership.
If all nodes are up and can communicate, in most cases the update goes through. However, a global
update requires that changes occur on all nodes or no nodes. Cluster service is strict about this. If
an active node refuses to take the global update, it can be sent a “poison packet” that will cause the
node to shut down its Cluster service and write a corresponding entry into the System Event Log.
That is, Cluster service kills the node if it refuses to take an atomic broadcast.
The Global Update Manager provides an interface for other components of the Cluster service to
initiate and manage updates. The Global Update Manager allows for changes in the online/offline
state of resources to be easily propagated throughout the nodes in a cluster. In addition,
notifications of cluster state changes are sent to all active nodes in the cluster.
Centralizing the global update code in a single component allows the other components of the
Cluster service to use a single, reliable mechanism.
It is an atomic multicast protocol guaranteeing that if one surviving member in the cluster receives
an update, all surviving members eventually receive the update, even if the original sender
(sequencer) fails. It also guarantees that updates are applied to the nodes in a serial order, based on
the NodeID within the cluster.

Locker Node
One cluster node, dubbed the locker node, is assigned a central role in the Global Update Protocol.
The locker node is the one that owns the quorum resource. Typically, the oldest node in the cluster
will be the locker node. Any node that wants to start a global update first contacts the locker. The
locker node promises that if the sender (sequencer) fails during the update, the locker (or its
successor if the locker node fails) will take over the role of sender and update the remaining nodes.
In addition, the locker node guarantees that updates initiated from any node will have a unique
sequence number in the cluster database.

Introduction to Windows Server 2003 Cluster Technologies 7


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Figure 1. Node Updates

Node Updates
To do a global update, the updating node first sends update information to the locker node. The
locker node updates itself and returns a sequence number for the update to the sender. Once an
updating node knows that the locker has accepted the update, it sends an RPC with the update
request and the sequence number to each active node (including itself) in seniority order, based on
NodeID. The nodes are updated one at a time in NodeID order starting with the node immediately
following the locker node, and wrapping around the IDs up to the node with the ID preceding the
lockers. Once the update has been installed at all nodes, the updating node sends the locker node
an unlock request to indicate the protocol terminated successfully.

Failures
The protocol assumes that if all nodes that received the update fail, it is as if the update never
occurred. Should all updated nodes fail, they will roll back the cluster database and log on
recovery, and the update will not have been applied to the cluster configuration. Examples of such
failures are:
● Sender (sequencer) fails before locker accepts update.
● Sender (sequencer) installs the update at the locker, but both sender and locker nodes fail after
that.
If the sender fails during the update process after the locker has accepted the update, the locker
reconstructs the update and sends it to each active node. Nodes that already received the update
detect this through a duplicate sequence number and ignore the duplicate update.
If the sender and locker nodes both fail after the sender managed to install the update at any node
beyond the locker node, the node with the lowest NodeID assumes the role of locker. This new
locker node would have been the first to receive the update since it has the lowest NodeID. Having
received the update and the sequence number, the new locker node can complete any update that
was in progress using the saved update information. To make this work, the locker allows at most
one update at a time. This gives a total ordering property to the protocol – updates are applied in a
serial order.

8 Introduction to Windows Server 2003 Cluster Technologies


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Database Manager and Log Manager


10

Database Manager
The Database Manager implements the functions needed to maintain the cluster configuration
database on each node. The Database Managers on each node of the cluster cooperate to maintain
configuration information consistently across the cluster. One-phase commits are used to ensure
the consistency of the cluster database in all nodes. The Database Manager also provides an
interface to the configuration database for use by the other Cluster service components. This
interface is similar to the registry interface exposed by the Microsoft Win32® API set with the key
difference being that changes made in one node of the cluster are atomically distributed to all
nodes in the cluster that are affected.

Log Manager
The Log Manager writes changes to the recovery log stored on the quorum resource when any of
the cluster nodes are down. This allows the cluster to recover from a partition in time, a situation
that occurs when cluster nodes are not online at the same time. The Log Manager also works with
the Checkpoint Manager to take checkpoints at appropriate moments, thereby helping to ensure
that the local cluster databases are kept consistent across the cluster.
Maintaining a checkpoint and log for use during restart is an instance of the more general
transaction processing techniques of logging and commit/abort to perform atomic state
transformations on all the clusdb replicas.

Cluster Configuration Database


11

Contains Information About All Physical and Logical Entities in a Cluster


When a cluster is formed on a node, the data necessary to start the cluster is read from the local
copy of the cluster configuration database. This database contains information about all physical
and logical entities in a cluster, such as the cluster itself, the nodes, resource types, groups, and
resources making up the cluster, plus network connections for both intra-cluster and client-to-
cluster communication.
When the cluster is operational and its configuration (e.g., which node owns a resource) is
changing, persistent and volatile information is used to track the current and desired state of the
cluster. A component of the Cluster service known as the Configuration Database Manager, or just
Database Manager, implements the functions needed to persist the cluster configuration database.
The Database Manager also provides an interface to the database for other components of the
cluster.

Introduction to Windows Server 2003 Cluster Technologies 9


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Global Update Manager Controls Globally Atomic Updates to Replicas


on All Nodes
A copy of this database is stored in the registry of each node as %systemroot%\cluster\clusdb, with
another master copy of the database maintained on the quorum drive. Updates to the configuration
database are applied to the replica on each node using an atomic update protocol. This operation is
controlled by another Cluster service component, the Global Update Manager. The Log Manager
component records the updates in a cluster log file, also stored on the quorum resource. Changes to
the cluster configuration are not written into the checkpoint file at the same time as the node
registries are updated.

Maintains Consistency of Configuration Data


On formation of a cluster, whichever node successfully reserves the quorum drive will form the
cluster. The location of the quorum resource is stored in the local cluster configuration in the
registry on the node forming the cluster. This local configuration may not be consistent with the
current cluster database on the quorum.
For example, consider a cluster containing two nodes, A and B. Assume Node A fails and is offline
for a period of time during which the cluster is managed by the other node, Node B. While Node A
is offline, Node B makes some configuration changes. These changes will be stored in change logs
on the quorum resource, but cannot propagate to Node A. Now assume Node B fails while Node A
is offline. Node A is then brought online and will form the cluster. Its cluster configuration in the
registry will be out of synchronization with the cluster data on the quorum resource. Once Node A
has brought the quorum resource online it can read the cluster database and change logs stored
there and update its registry from this data. This ensures that all updates made to the cluster
database will be preserved. If the cluster configuration data changes further, and then Node B is
brought online and joins the cluster, its replica of the database can be brought up to date. The
joining node contacts an active member, as identified from the registry on the joining node, and
queries for the current version of the database.

Resource Manager and Failover Manager


12
The Resource Manager and Failover Manager work together to control the state of a resource in a
cluster and which node the resource runs on. The two components are combined inside the Cluster
service.

Resource Manager
The Resource Manager is responsible for:
● Managing resource dependencies.
● Starting and stopping resources, by directing the Resource Monitors to bring resources online
and offline.
● Initiating failover and failback.
To perform the preceding tasks, the Resource Manager receives resource and cluster state
information from the Resource Monitors and the Node Manager. If a resource becomes
unavailable, the Resource Manager either attempts to restart the resource on the same node or
initiates a failover of the resource, based on the failover parameters for the resource. The Resource
Manager initiates the transfer of cluster groups to alternate nodes by sending a failover request to
the Failover Manager.

10 Introduction to Windows Server 2003 Cluster Technologies


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

The Resource Manager also brings online or takes offline resources in response to an operator
request from Cluster Administrator, for example.

Failover Manager
The Failover Manager handles the transferring of groups of resources from one node to another in
response to a request from the Resource Manager. The Failover Manager is responsible for
deciding which systems in the cluster should “own” which groups. When this group arbitration
finishes, those systems that own individual groups turn control of the resources within the group
over to their respective Resource Managers. When failures of resources within a group cannot be
handled by the owning system, the Failover Managers re-arbitrate for ownership of the group. The
Failover Manager assigns groups to nodes based on failover parameters such as available resources
or services on the node and the possible owners defined for the resource.

Pushing a Group
13

Resource Manager Can Take Unavailable Resources Offline


The failover and restart properties of a resource may specify that the resource should not be
restarted on the same node after a failure; for example, the number of failures for the resource
exceeded a configured threshold. In such cases, when the Resource Manager is notified that one of
its online resources has become unavailable, it will choose not to restart the resource and instead
take the resource offline, along with any dependent resources.
Taking a resource offline may be configured to affect the group, in which case the Resource
Manager will indicate to the Failover Manager that the group should be restarted on another
system in the cluster. This is called pushing a group to another system.

Cluster Administrator Can Manually Push a Group


Cluster Administrator may also manually initiate such a group transfer. The algorithm for both
situations is identical, except that resources are gracefully shut down for a manually initiated
failover, while they are forcefully shut down in the failure case. The pushing of a group can be
thought of as a controlled event, where all nodes are in communication coordinating the transition.

Note The threshold counter has a limit of ten. This means that groups cannot be brought online

more than 10 times without restarting the Cluster service.

How Pushing a Group Works


The process that occurs when pushing a group is as follows:
1. All resource objects in the failed resource’s dependency tree are enumerated, including all
resources that depend on the failed resource and the resources upon which the failed resource
depend.
2. The Resource/Failover Manager takes all resources in the dependency tree offline.
The resources are taken offline based on their dependencies, sequentially and synchronously.

Introduction to Windows Server 2003 Cluster Technologies 11


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

3. Failover is initiated.
The Resource/Failover Manager on the node that previously owned the resource notifies the
Resource/Failover Manager on the destination node. The destination node is notified that a
failover is occurring.
4. The destination Resource/Failover Manager begins to bring the resources online, in the
opposite order from which they were taken offline.

Pulling a Group
14

Absence of Five Successive Heartbeats Generates Failure Suspicion and


Causes a Regroup
When an active node fails, or the network connections between the node and the rest of the cluster
fail, the lack of heartbeats allows the remaining nodes to detect the failure and consider the node to
be offline. In this case, the Cluster service starts the regroup membership algorithm to determine
the current membership in the cluster. After the new membership has been established, resources
that were online at any failed member must be pulled to the other active nodes, i.e. they are
brought online at the active nodes, based on the cluster configuration. This process is similar to
pushing a resource group, but without the shutdown phase on the failed node.

Current Configuration Replicated Around Cluster to Determine How


Groups Should Be Pulled
The added complication when pulling a group is determining what groups were running on the
failed node and which node should take ownership of the various groups. The replicated cluster
database gives all active nodes full knowledge of the resource groups on the failed node and their
failover properties. In a two-node cluster, it is readily apparent to a node that resources were
running on the other node. State transitions initiated on a node are reported around the cluster by
the Global Update Manager.
All nodes capable of hosting the groups determine for themselves the new ownership. This
selection is based on node capabilities, the group's preferred owner list, and a simple tiebreaker
rule, in case the nodes cannot decide which node should be the new host. The nodes have an
identical replica of the cluster database and therefore can determine the new hosts without
communicating with one another. Each active node pulls (brings online) the resource groups it now
owns.

Regroup
In the event that one system detects a communication failure with another cluster node, it
broadcasts a message to the entire cluster causing all members to verify their view of the current
cluster membership. This is called a regroup event. Writes to shared devices must be frozen until
the membership has stabilized.
Regroup works by re-computing the members of the cluster. The cluster agrees to regroup after
checking communications among the nodes. If a Node Manager on a system does not respond, it is
removed from the cluster and its active groups must be failed over to an active system. Finally, the
cluster Membership Manager informs the cluster’s upper levels (such as the Global Update
Manager) of the regroup event.

12 Introduction to Windows Server 2003 Cluster Technologies


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Note Regroup is also used for the forced eviction of active nodes from a cluster.

Regroup States
After regroup, one of two states occurs:
1. There is a minority group or no quorum device, in which case the group does not survive.
2. There is a non-minority group and a quorum device, in which case the group does survive.
There is a non-minority rule such that the number of new members must be equal to or greater than
half of the old active cluster. This provision prevents a minority from seizing the quorum device at
the expense of a larger potentially surviving cluster. In addition, the quorum guarantees
completeness, by preventing a so-called split-brain cluster; that is, two nodes (or groups of nodes)
operating independently as the same cluster. Whichever group loses quorum arbitration will shut
down its Cluster service.

Failback
A group can be configured with a preferred owner node, so that if the preferred node is running,
the group will run on that node rather than any other. For example, Microsoft SQL Server and
Microsoft Exchange Server could be configured with different preferred owners in a cluster. In this
case, if all nodes are active the services should run on different nodes. If one node fails, another
will pull the groups it was hosting.
Every group has a failback property. If this is enabled, when the preferred owner comes online, the
Failover Manager can decide to move such groups back to the original node. To do this, the
Failover Manager on the preferred owner contacts the Resource Manager on the node that
currently has the resources online. The groups are then pushed from the current owner to the
preferred owner, as described above.
There is a failback option that can be configured to control the time of day during which failback
can occur. If a Failback window has been configured, the Resource/Failover Manager will wait for
the designated time before initiating failback.

Communication Manager
15
The components of the Cluster service communicate with other nodes through the Communication
Manager. Several components in each node of a cluster are in constant communication with other
nodes. Communication is fully connected in small clusters. That is, all nodes are in direct
communication with all other nodes. Intra-cluster communication uses RPC mechanisms to
guarantee reliable, exactly one time delivery of messages.
Communication Manager filters out messages from former cluster nodes that have been evicted.

1
In the following several slides, the same slide is provided with different components highlighted.
Teach the component that is highlighted on each page.
Introduction to Windows Server 2003 Cluster Technologies 13
© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Cluster Registry
16
The cluster configuration, including the properties of all resources and groups in the cluster, are
stored as a registry hive on each node in the cluster. This cluster database is loaded into the registry
on each node when the Cluster service starts up.

Note The cluster registry keys are separate from the rest of the Windows Server 2003 registry. The

cluster hive is stored in %windir%\Cluster in the file Clusdb and associated Clusdb.log file.

The cluster registry maintains updates on members, resources, restart parameters, and other
configuration information for the whole cluster. It is important that the cluster registry is
maintained on stable storage and is replicated at each member through a global update protocol.
Cluster service loads its registry settings into HKEY_LOCAL_MACHINE\Cluster.
Under this key are the following subkeys:
● Groups
● NetworkInterfaces
● Networks
● Nodes
● Quorum
● Resources
● Resource types
Each of these subkeys contains the configuration information for the cluster. For example, when a
new group is created a new entry is added under
HKEY_LOCAL_MACHINE\Cluster\Groups.
One method of verifying a successful installation of Windows Clustering is to verify that the
preceding registry keys have been created.
Cluster service stores the parameters for its resources and groups in keys in the cluster database in
the registry. The key name for a given group or resource is a globally unique identifier (GUID).
GUIDs are 128-bit numbers generated by a special algorithm that uses the network card media
access control (MAC) address of the machine on which it is running, and the exact time at which
the GUID is created, to ensure that every GUID is unique. GUIDs are used to identify many
components of the cluster, such as a resource, group, network or network interface. They are used
internally by the Cluster service to reference a cluster component. The Cluster service updates the
cluster configuration based on event notification of property changes for a GUID.
To assist in understanding the cluster log references to GUIDs, a Cluster object file (%windir%\
Cluster\Cluster.oml) is automatically created and maintained that contains a mapping of GUID's to
Resource Name mappings.
Example:
00000488.00000660::2003/05/19-23:58:22.977 OBCREATE "Resource" "7dc7fb50-
be58-4b34- 912e-830c93043e74" "Disk R:"

14 Introduction to Windows Server 2003 Cluster Technologies


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

00000488.000007a4::2003/05/19-23:58:24.561 OBCREATE "Resource" "c90deb9c-


5e0e-41ae-8000-fc1c71f389c7" "Disk S:"

00000488.0000077c::2003/05/20-00:04:06.985 OBCREATE "Network Interface"


"33fb0cb8-fb48-43fe-9de6-678754b5325b" "Public - NODE2"

00000488.00000894::2003/05/20-00:17:00.746 OBCREATE "Resource" "441f8f8d-


dd38-42ce-99e3-71f495afb5b1" "FS IP"

Much of the information that is stored in the cluster database is property information. For example,
below is an excerpt from the information stored in the cluster database for the physical disk
resource type:
Physical Disk
==========
AdminExtensions : REG_MULTI_SZ : {4EC90FB0-D0BB-11CF-B5EF-
00A0C90AB505}

DllName : REG_SZ : clusres.dll

IsAlivePollInterval : REG_DWORD : 0xea60

LooksAlivePollInterval : REG_DWORD : 0x1388

Name : REG_SZ : Physical Disk

AdminExtensions, DllName, IsAlivePollInterval, LooksAlivePollInterval, and Name are all


common properties for any type of resource. The values for IsAlivePollInterval and
LooksAlivePollInterval are defaults for the resource type. Individual resources can be configured
with their own values. Individual resources often store both common and private property
information in the cluster database. Whereas common properties apply to all resources regardless
of their type, private properties apply only to resources of a specific type.
For example, an IP address resource requires an address, name, and subnet mask while a physical
disk resource requires a signature.
HKEY_LOCAL_MACHINE\Cluster\Resources contains one subkey for each resource defined in
the cluster. The parameter types stored here are common for all resources. They include restart
parameters, resource dependencies and resource monitoring settings. Each resource key contains a
Parameters subkey with resource specific values. For example, the Parameters subkey for a file
share resource will include the share name, security data containing the share-level permissions
and the path to the folder on a shared SCSI disk.
Because groups can contain many different types of resources, there is nothing resource-specific
about the group’s parameters. Groups store the following information in
HKEY_LOCAL_MACHINE\Cluster\Groups in the cluster database:
● List of GUIDs that identify resources that are members of the group.
● Name and description of the group, such as Cluster Group or Disk Group.
● PersistentState, a numeric constant that indicates the group's operational status.
● Failover threshold and period.
● Failback configuration.
● The NodeID of nodes designated as preferred owners of the group.

Introduction to Windows Server 2003 Cluster Technologies 15


© 2006 Microsoft Corporation. All rights reserved.
10. Cluster Component Architecture

Different values and parameters are discussed in depth in later modules. However, the important
items that have been discussed in this section are:
● The cluster registry is a database of the cluster configuration.
● All the nodes in a cluster should have identical cluster registries.
● In the cluster registry, each object, be it a group, network, network interface or resource, is
identified as a GUID.
● While troubleshooting a cluster by making use of the cluster log, you will need to reference
the cluster registry or the Cluster.oml file to find the friendly name of a resource because the
cluster log will refer to the resource by GUID.

LAB 6: Examining the Cluster Registry


17

During this lab session, you will examine the cluster registry on each node to identify how cluster
configuration is stored locally on each node. You will also use the Load Hive command in
Regedt32 to examine the contents of the checkpoint file.
Refer to the accompanying Lab Manual to complete the practice exercises.

Review
18
Topics discussed in this session include:
● Introduction
● Cluster Architecture Overview
● Cluster Service Component Architecture

16 Introduction to Windows Server 2003 Cluster Technologies


© 2006 Microsoft Corporation. All rights reserved.

You might also like