Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

DB2 UDB and High Availability with

VERITAS Cluster Server


Tech Note

Revision 2
April 1, 2002
IBM Corporation
Abstract
We describe the implementation and design of highly available DB2 Universal Database (UDB)
environments using VERITAS Cluster Server (VCS) in SPARC/Solaris configurations from Sun
Microsystems. Included is a detailed description of the VCS HA-DB2 Agent. We provide
guidance and recommendations for configuring highly available database environments around
DB2 UDB Enterprise Edition (EE) or Extended Enterprise Edition (EEE). Practical considerations
regarding design, implementation, testing and performance work with VCS are also discussed.

Overview
It is presumed that the reader is familiar with the motivation for the implementation of high
availability database solutions and with the basic terminology in this field. In addition, it is
assumed that the reader has some experience with DB2 UDB and with VCS. However, some
crucial overview material will be presented in the following section in order to prepare the reader
for the presentation of the VCS HA-DB2 Agent.

Overview: DB2 UDB v7


As the foundation for e-business, DB2 UDB v7 is the industry’s first multimedia, Web-ready
relational database management system able to meet the demands of large corporations and
flexible enough to serve medium-sized and small e-businesses. DB2 UDB combines integrated
power for business intelligence, content management, enterprise information portals and e-
business with industry-leading performance and reliability to drive the most demanding industry
solutions. DB2 UDB, together with Internet technology, makes information easily accessible,
available and secure. There are more than 40 million DB2 users from over 300,000 companies
worldwide relying on IBM data management solutions. For more information, please visit

http://www.ibm.com/software/data

Overview: VERITAS Cluster Server


VERITAS Cluster Server, the industry's leading open systems clustering solution, is ideal for
eliminating both planned and unplanned downtime, facilitating server consolidation, and
effectively managing a wide range of applications in heterogeneous environments. With support
for up to 32 node clusters in storage area network (SAN) and traditional client/server
environments, VERITAS Cluster Server features the power and flexibility to protect everything
from a single critical database instance, to very large multi-application clusters in networked
storage environments. We will present a brief product summary; the reader is urged to refer to
the sources listed herein for the definitive product documentation and specifications. Note that
any references in this document to VERITAS Cluster Server or VCS refer specifically to VERITAS
Cluster Server for Solaris Version 2.0 or later.

Overview: VERITAS Cluster Server Hardware Requirements


The current list of hardware supported by VERITAS Cluster Server is as follows:

For server nodes:


• Any SPARC/Solaris server from Sun Microsystems running Solaris 2.6 or later with a
minimum of 128MB RAM

For disk storage:


• EMC Symmetrix, IBM Enterprise Storage Server, HDS 7700 and 9xxx, Sun T3, Sun
A5000, Sun A1000, Sun D1000 and any other disk storage supported by VCS 2.0 or
later; your VERITAS representative can confirm which disk subsystems are
supported or you can refer to VCS documentation
• Typical environments will require mirrored private disks (in each cluster node) for the
DB2 UDB binaries and shared disks between nodes for the DB2 UDB data

IBM Corporation 1 April 1, 2002


For network interconnects:
• For the public network connections, any network connection supporting IP-based
addressing
• For the heartbeat connections (internal to the cluster), redundant heartbeat
connections are required; this requirement can be met through the use of two
additional Ethernet controllers per server or one additional Ethernet controller per
server and the use of one shared GABdisk per cluster

Overview: VERITAS Cluster Server Software Requirements


VERITAS and IBM worked jointly to qualify configurations that include the following VERITAS
software components:

• VERITAS Volume Manager 3.2 or later, VERITAS File System 3.4 or later, VERITAS
Cluster Server 2.0 or later
• DB Edition for DB2 for Solaris 1.0 or later

While VERITAS Cluster Server does not require a volume manager, the use of VERITAS Volume
Manager is strongly recommended for ease of installation, configuration and management.

Overview: VERITAS Cluster Server Failover


VERITAS Cluster Server is an availability clustering solution that manages the availability of
application services, such as DB2 UDB, by enabling application failover. The state of each
individual cluster node and its associated software services are regularly monitored. When a
failure occurs that disrupts the application service (in this case, the DB2 UDB service), VERITAS
Cluster Server and/or the VCS HA-DB2 Agent detect the failure and automatically take steps to
restore the service. This can include restarting DB2 UDB on the same node or moving DB2 UDB
to another node in the cluster and restarting it on that node. If an application needs to be
migrated to a new node, VERITAS Cluster Server moves everything associated with the
application (network IP addresses, ownership of underlying storage) to the new node so that
users will not be aware that the service is actually running on another node. They will still access
the service using the same IP addresses, but those addresses will now point to a different cluster
node.

When a failover occurs with VERITAS Cluster Server, users may or may not see a disruption in
service. This will be based on the type of connection (stateful or stateless) that the client has with
the application service. In application environments with stateful connections (like DB2 UDB),
users may see a brief interruption in service and may need to reconnect after the failover has
completed. In application environments with stateless connections (like NFS), users may see a
brief delay in service but generally will not see a disruption and will not need to log back in.

By supporting an application as a “service” that can be automatically migrated between cluster


nodes, VERITAS Cluster Server can not only reduce unplanned downtime, but can also shorten
the duration of outages associated with planned downtime (maintenance, upgrades, etc.).
Failovers can be instigated manually as well. If a hardware or operating system upgrade must be
performed on a particular node, DB2 UDB can be migrated to another node in the cluster, the
upgrade can be performed, and then DB2 UDB can be migrated back (optional).

Applications recommended for use in these types of clustering environments should be crash
tolerant. A crash tolerant application is one that can recover from an unexpected crash while still
maintaining the integrity of committed data. Crash tolerant applications are sometimes referred to
as “cluster friendly” applications. DB2 UDB is such an application.

Overview: VERITAS Cluster Server Shared Storage


VERITAS Cluster Server, when used with the VCS HA-DB2 Agent, will require shared storage.
Shared storage is defined as storage that has a physical connection to multiple nodes in the

IBM Corporation 2 April 1, 2002


cluster. Disk devices resident on shared storage can tolerate node failures since a physical path
to the disk devices still exists through one or more alternate cluster nodes.

Through the control of VERITAS Cluster Server, cluster nodes are allowed access to shared
storage through a logical construct called “disk groups”. Disk groups represent a collection of
logically defined storage devices whose ownership can be atomically migrated between nodes in
a cluster. By definition, a disk group can only be imported to a single node at any given time.
Additionally, a disk group can only belong to one Resource Group in VERITAS Cluster Server. If
Disk Group A is imported to Node 1 and Node 1 fails, Disk Group A can be exported from the
failed node and imported onto a new node in the cluster. VERITAS Cluster Server can
simultaneously control multiple disk groups within a single cluster.

In addition to allowing disk group definition, a volume manager can provide for redundant data
configurations, using mirroring or RAID 5, on shared storage. VERITAS Cluster Server supports
VERITAS Volume Manager and Solstice DiskSuite as logical volume managers. Combining
shared storage with disk mirroring and striping can protect against both node failure and
individual disk or controller failure.

Overview: VERITAS Cluster Server GAB and LLT


An internode communication mechanism is required in cluster configurations so that nodes can
exchange information concerning hardware and software status, keep track of cluster
membership and keep this information synchronized across all cluster nodes. The Global Atomic
Broadcast (GAB) facility, running across a low latency transport (LLT), provides the high speed,
low latency mechanism used by VERITAS Cluster Server for this purpose. GAB is loaded as a
kernel module on each cluster node and, true to its name, provides an atomic broadcast
mechanism that ensures that all nodes get status update information at the same time.

By leveraging kernel-to-kernel communication capabilities, LLT provides a high speed, low


latency transport for all information that needs to be exchanged and synchronized between
cluster nodes. GAB runs on top of LLT. VERITAS Cluster Server does not use IP as a heartbeat
mechanism, but offers two other more reliable options. GAB, running over LLT, can be
configured to act as a heartbeat mechanism or a GAB disk (disk-based heartbeat) may be
configured. The heartbeat must run over redundant connections. These connections may either
be two private Ethernet connections between cluster nodes or one private Ethernet connection
and one GABdisk connection. The use of two GABdisks is not a supported configuration since
the exchange of cluster status between nodes requires a private Ethernet connection.

For more information about either GAB or LLT or how to configure them in VERITAS Cluster
Server configurations, please consult the VERITAS Cluster Server 2.0 User’s Guide for Solaris.

Overview: VERITAS Cluster Server Bundled and Enterprise Agents


An agent can be viewed as a small program that is designed to manage the availability of a
particular resource or application. When an agent is started, it obtains the necessary
configuration information from VCS and then periodically monitors the resource or application,
updating VCS with status. In general, agents are used to bring resources online, take resources
offline or monitor resources and provide four types of specific services: start, stop, monitor and
clean. Start and stop are used in bringing resources online and taking them offline (onlining and
offlining resources ), monitor is used to test a particular resource or application for its status, and
clean is used in the recovery process for a failed resource.

A variety of bundled agents are included as part of VERITAS Cluster Server and are installed
when VERITAS Cluster Server is installed. The bundled agents are VCS processes that manage
predefined resource types that are commonly found in cluster configurations, such as IP, mount,
process and share, and they help to simplify cluster installation and configuration considerably.
There are over 20 bundled agents with VERITAS Cluster Server.

IBM Corporation 3 April 1, 2002


Enterprise agents tend to focus on specific applications such as DB2 UDB. The VCS HA-DB2
Agent can be viewed as an Enterprise Agent, and interfaces with VCS through the VCS Agent
framework.

Overview: VCS Resources, Resource Types and Resource Groups


A resource type is an object definition used to define resources within a VCS cluster that will be
monitored. A resource type includes the resource type name and a set of properties associated
with those resources that are salient from a high availability point of view. A resource inherits the
properties and values of its resource type, and resource names must be unique on a cluster-wide
basis.

There are two types of resources: persistent and standard (non-persistent). Persistent resources
are resources such as network interface controllers (NICs) that are monitored but are not brought
online or taken offline by VCS, while standard resources are those whose online/offline status is
controlled by VCS.

The lowest-level monitored object is a resource, and there are various resource types (share,
mount, etc.). Each resource must be configured into a resource group, and VCS will bring all
resources in a particular resource group online and offline together. To bring a resource group
online or offline, VCS will invoke the start/stop methods for each of the resources in the group.
There are two types of resource groups: failover and parallel. A highly available DB2 UDB
configuration, regardless of whether it is based on DB2 EE or DB2 EEE, will use failover resource
groups.

A “primary” or “master” is defined as a node that can potentially host a resource. A resource
group attribute called systemlist is used to specify which nodes within a cluster can be
primaries for a particular resource group. In a two-node cluster, usually both nodes are included
in the systemlist, but in larger, multi-node clusters that may be hosting several highly available
applications there may be a requirement to ensure that certain application services (defined by
their resources at the lowest level) can never fail over to certain nodes.

Dependencies can be defined between resource groups, and VERITAS Cluster Server depends
on this resource group dependency hierarchy in assessing the impact of various resource failures
and in managing recovery. An example might be a situation where a certain resource group,
called ClientApp1, can not be brought online unless another resource group, called DB2, has
already been successfully started. The DB2 resource group may have been defined to include
various network address, disk group and application (DB2 UDB) resources. In this case,
resource group “ClientApp1” is considered dependent on resource group “DB2”.

IBM Corporation 4 April 1, 2002


DB2 UDB v7: Installation and Configuration in a VERITAS
Cluster Server Environment
The installation of DB2 UDB v7 in a VERITAS Cluster Server environment closely mirrors that
which is documented in the relevant DB2 Universal Database Quick Beginnings guide. Please
note that the configuration that has been jointly qualified using the VCS HA-DB2 Agent discussed
herein requires DB2 UDB version 7.2. DB2 UDB version 7.2 is defined as DB2 UDB version 7.1
plus fixpack 3 or greater.

At this stage, it is presumed that VERITAS Cluster Server 2.0 or later has been configured
according to the information provided in the VERITAS Cluster Server 2.0 Users Guide for Solaris.
In this document, we assume that the installation and configuration of DB2 UDB EE or EEE is
done using the DB2 Installer program (db2setup). Other methods are possible, however, we
believe the use of the db2setup installer is the most straightforward approach. For that reason,
that is the method discussed in this document. Please ensure that the mount point for the
instance home directory is on shared storage and the storage is physically attached to all
potential “primary” nodes that will be used for recovery in the event of a failover. Remember that
the primary nodes will appear in the Resource group systemlist attribute referenced in the
previous section for the resource group that contains the DB2 instance.

DB2 UDB v7: Installation and Configuration in a VERITAS Cluster Server Environment:
User and Group Creation Considerations
On each potential primary for each DB2 server instance, three separate groups and user
accounts need to be created for the:

• DB2 UDB instance owner


• User that will execute fenced user defined functions (UDFs) or stored procedures
• Administration Server

All cluster users are required to be local. The use of NIS or NIS+ for users is not recommended
since these are not highly available services and if their service is interrupted, VCS recovery may
not work correctly (script invocations may hang). To define the users and groups for the cluster,
repeat the steps below on all the nodes in the cluster. The users and groups must be defined
identically across all nodes in the cluster; ensure that the user id, group id, user home directory,
account passwords and user shell are consistent across all nodes in the cluster. You can use
db2setup to create userids and an instance on one host. Then inspect the /etc/group and
/etc/passwd file to determine appropriate userid and group information to duplicate on all other
hosts targeted for failover for this resource. For example, use the following Solaris commands to
create groups on each server:

groupadd –g 999 db2iadm1


groupadd –g 998 db2fadm1
groupadd –g 997 db2asgrp

Use the following Solaris commands to create user accounts without creating home directories:

useradd –g db2iadm1 –u 1004 –d /home/db2inst1 –s /bin/ksh db2inst1


useradd –g db2fadm1 –u 1003 –d /home/db2fenc1 –s /bin/ksh db2fenc1
useradd –g db2asgrp –u 1002 -d /home/db2as –s /bin/ksh db2as

Note that for these commands, we presume /home is the mount point of a file system that will be
utilized to host the DB2 UDB instance home directory and that this directory is placed on
shared storage to ensure that /home can be shared among all primary nodes. Also, please

IBM Corporation 5 April 1, 2002


remember the mount point should be specified in the /etc/vfstab and should be set to not
automount at reboot time similar to the following:
/dev/vx/dsk/db2dg/db2vol /dev/vx/rdsk/db2dg/db2vol /home vxfs 3 no

Please ensure that the account passwords are consistent across all nodes in the cluster. Use the
Solaris passwd command to set them if required.DB2 UDB v7: Installation and Configuration
in a VERITAS Cluster Server Environment: Installation of the DB2 Binary
The DB2 UDB v7 setup utility will install the executable onto the path /opt/IBMdb2/V7.1. This
is for version 7 of DB2 UDB only. Other versions will install in different paths according to the
version number. Please note, however, that the use of the VCS HA-DB2 Agent discussed herein
is supported only on DB2 UDB version 7.2 or later. The DB2 executable will need to be installed
on mirrored private disks on all potential primaries for the DB2 UDB instance. These mirrored
private disks should be physically located within each cluster node rather than as part of the
shared storage subsystem(s).

Use the db2setup tool to create the instance. Ensure that the instance is not auto started on
node reboot through a command in the /etc/init.d/inetinit script; the instance “start” and
“stop” should be under control of the VCS HA-DB2 Agent. If you are planning to fail over the
instance home directory to coincide with instance failures, the instance need only be created on
the primary node.

DB2 UDB v7: Installation and Configuration in a VERITAS Cluster Server Environment:
Configuration of a Particular DB2 UDB Enterprise Edition (EE) Instance
Verify that the installation of the DB2 UDB binaries as well as the creation of the specified DB2
UDB EE instance have now been completed by the db2setup tool. If the installation is not
complete, consult the log file specified by the db2setup tool to determine the cause of the error.

DB2 UDB v7: Installation and Configuration in a VERITAS Cluster Server Environment:
Configuration of DB2 UDB Extended Enterprise Edition (EEE) Instances
Multiple DB2 UDB EEE instances can be installed on a single node and managed for high
availability with VERITAS Cluster Server. Only DB2 UDB EEE SMP mode is supported; DB2
UDB EEE MPP mode requires the use of a cluster file system for the instance home directories.
Note that the use of a cluster file system to mount the instance home directories is not supported
in this release. The use of NFS to perform file sharing amongst the various physical nodes in the
cluster is not recommended either. DB2 UDB EEE should only be used in SMP mode in
VERITAS Cluster Server configurations.

DB2 UDB v7: Confirming DB2 Instance Installation


Proper DB2 UDB instance installation must be verified prior to installing the VCS HA -DB2 Agent.
Regardless of whether the DB2 UDB instance is EE or EEE, each instance must have an
available mount point and should be manually controllable. To confirm this, issue the following
command (as instance owner) against each DB2 UDB instance:

db2start

This should complete successfully. If for DB2 UDB it does not complete successfully at all nodes,
that likely means that a configuration error has occurred. Please review the DB2 Universal
Database Quick Beginnings Guide and resolve before proceeding.

Next, again as instance owner, issue the following command:

db2stop

IBM Corporation 6 April 1, 2002


This should also complete successfully. Again, if for DB2 UDB it does not complete successfully
at all nodes, that likely means that a configuration error has occurred. Please review the DB2
Universal Database Quick Beginnings Guide and resolve before proceeding.

After verifying that the various DB2 UDB instances can be started and stopped, attempt to create
the sample database (or an empty test database if preferred). When the create database
command can be completed successfully, remove the test or sample database using the drop
database command.

The DB2 UDB instances, regardless of whether they are EE or EEE, are now ready to be made
highly available.

VCS HA-DB2 Agent


The VCS HA -DB2 Agent will be provided on DB2 UDB fixpack 6 and above. Prior to the release
of that fixpack, the agent may also be obtained from the following location:

ftp://ftp.software.ibm.com/ps/products/db2/fixes/english-us/db2sunv7/vcs/

The VCS HA-DB2 Agent is a software-based agent that interfaces with the VCS Agent framework
to manage the availability of DB2 UDB-based data services. It includes four basic routines:

• Start (online) is used to call the db2start script that will start the DB2 UDB instance
• Stop (offline) is used to call the db2stop script that will gracefully shut down a DB2
UDB instance
• Monitor provides a way to verify the state of DB2 data services (up, down, hung) and
can be modifi ed to provide more extensive data service monitoring
• Clean is used to “clean up” a failed DB2 UDB instance in preparation for a restart and
is generally called when there has been some type of DB2 data service failure, but is
generally not called when the DB2 data service has been shut down gracefully

Most of these methods are implemented as shell scripts, an approach which allows for readability
and maximum flexibility in the hands of a skilled administrator. Note, however, that any agent
modifications will not be supported by IBM/VERITAS.

The VCS HA -DB2 Agent can differentiate between DB2 UDB EE and EEE instances. The agent
registration script (regdb2udbvcs) checks to see what type of DB2 UDB instance is installed
and configures itself appropriately to monitor the type of DB2 UDB instance(s) it finds. The DB2
UDB instance(s) must already be installed, the mount point(s) for each instance needs to be
available, and each instance should be manually controllable. Confirm this as shown in the
previous section by invoking the start/stop scripts against each DB2 UDB instance.

The following directory structure should be present beneath /opt/IBMdb2/V7.1/ha/ on all


primary cluster nodes:

DB2UDBAgent
README.db2udb.vcs
clean
monitor
offline
online
db2udb.type.cf
util/
docs/

IBM Corporation 7 April 1, 2002


To ensure that the VERITAS Cluster Server cluster is configured correctly, issue the hacf
command:

hacf –verify /etc/VRTSvcs/conf/config

This should return successfully. If any error is specified, please correct the source of the error
prior to proceeding.
Next, include the given DB2 UDB type into the VCS cluster infrastructure. For example, perform
the following command at each node in the VCS cluster that has been specified as a primary for
the DB2 UDB instance:

cp /opt/IBMdb2/V7.1/ha/db2udb.type.cf /etc/VRTSvcs/conf/config

Note that you may also use symlinks instead of physically copying the file into place. Once the
file containing the new DB2 UDB type is in the appropriate location, the VERITAS Cluster Server
main.cf file must be updated. To accomplish this, first use the VCS hastatus command to
confirm that the VCS cluster is offline, and then add the following line:

include “db2udb.type.cf”

to the main VCS configuration file:

/etc/VRTSvcs/conf/config/main.cf

When using the VERITAS Cluster Server HAGUI, the import command can be executed to
include the resource type without requiring the VCS cluster to be offline. When working at the
VCS command line prompt, the hastatus and haconf commands must be used.

With the DB2 UDB type now registered as a valid VCS resource type, the VCS agent framework
must be updated with the location of the start, stop, monitor and clean scripts corresponding to
this type. Create the link for all DB2 UDB instance primary nodes to allow access to the VCS HA-
DB2 Agent executables as follows:

cd /opt/VRTSvcs/bin;ln –s /opt/IBMdb2/V7.1/ha/ DB2UDB

The VCS cluster is now able to accept the instantiation of DB2 UDB resources based on the
DB2UDB resource type.

Logical Hostname/IP Failover


A logical hostname name, together with the IP address(es) to which it maps, is associated with a
particular DB2 UDB instance. Client programs will access the DB2 UDB instance using this
logical hostname/IP instead of the physical hostname/IP on a particular node in the cluster. This
logical hostname/IP is the entry point to the cluster and it shields the client program from
addressing the physical servers directly. It is this logical addressing that allows a particular
application service, such as a DB2 UDB data service, to be migrated to different nodes in the
cluster without notifying client programs of changed physical addresses. This logical
hostname/IP address is what is catalogued from the DB2 TPC/IP client using the catalog
command.

From within VERITAS Cluster Server, this logical hostname/IP is configured as an instantiation of
the IP resource type. This resource must be included within the same resource group as the
DB2UDB instance resource. In the case of a failure, the entire resource group, including the DB2
UDB instance and the logical hostname/IP, will be failed over to another primary. This floating IP
setup provides high availability service to client programs requiring access to the DB2 UDB
instance.

IBM Corporation 8 April 1, 2002


Ensure that this hostname maps to one or more IP addresses that are associated with that
particular DB2 UDB instance. This name-to-IP address mapping should be configured on all
primary nodes in the cluster in the /etc/hosts file on each primary. More information on
configuration for public IP addresses can be found in the VERITAS Cluster Server 2.0 User’s
Guide for Solaris.

IBM Corporation 9 April 1, 2002


VCS HA-DB2 Agent: Package Contents
There are four supplied scripts that are used to control the registration, unregistration, onlining
and offlining of DB2 UDB in VERITAS Cluster Server environments. Note that while there exist a
number of other components to the package, these four are the only ones that are designed to be
called directly by the individual user. A man style document is provided in the doc directory for
each of the supplied methods. All four scripts must be run logged in as root or as the VCS
administrator.

VCS HA-DB2 Agent: Package Contents: regdb2udbvcs


This method will register appropriate resources and resource groups for a specified DB2 UDB
database instance. Typically, this will be the first script called, and it will perform all the
necessary steps to prepare DB2 UDB for VCS control. This method will produce a working
example for basic DB2 installations, and can build associated resources and dependencies for up
to one mount point. Note that it will not attempt to online any resources or resource groups, nor
will it set them to autostart. If autostart is desired, the attribute must be set using hagrp
commands or via other VCS facilities. For example:
hagrp –modify db2_grp AutoStartList sysa

Also, if you have more than one mount point it must be added manually after the script has been
run. A sample main.cf containing a DB2 resource group with multiple mount points is included
in Appendix A of this document. It can be used as a model for building a DB2 resource group
from scratch using VI editor or VCS commands.

Note: if you are planning to register the DB2 instance for HA support using this script then you
should not pre-define DB2-related disk groups, volumes, and mount points in VCS. Also, the
current implementation of the DB2 HA agent for VCS does not support making Administration
Instances highly available.

VCS HA-DB2 Agent: Package Contents: unregdb2udbvcs


This method will execute the required VCS commands in order to remove DB2 UDB from the
VCS cluster. It will remove DB2 resources and resource groups registered for the DB2 UDB
instance that were registered with the regdb2udbvcs script. This method is the inverse of
regdb2udbvcs, and will be generally called if the DB2 UDB instance is no longer required to be
highly available.

The unregdb2udbvcs script will remove all resources and resource groups associated with the
relevant DB2 UDB instance. It is intended to simplify removal of resources and resource groups
added by the regdb2udbvcs script. Note that there is no provision for an undo of the removal
of resources or resource groups. If it is desirable to remove resources and resource groups
individually, then they may also be removed through the VCS GUI or through direct editing of the
VCS main.cf file following usual VCS configuration editing procedures.

VCS HA-DB2 Agent: Package Contents: onlinedb2udbvcs


This method will execute the required VCS commands in order to bring a highly available DB2
UDB instance that was created with regdb2udbvcs online. It will not create any resources or
resource groups. It will enable the DB2 resource group and then bring it online.
VCS HA-DB2 Agent: Package Contents: offlinedb2udbvcs
This method will execute the required VCS commands in order to bring a highly available DB2
UDB instance that was created with regdb2udbvcs offline. It will not remove any resources or
resource groups. It will bring the DB2 resource group offline and then disable it.

VERITAS Cluster Server HA-DB2 Agent: Examples

IBM Corporation 10 April 1, 2002


Example 1) Manually adding the required resource and resource groups.

We will illustrate the use of the DB2 package with several fully completed
examples. Items entered by the user will be distinguished with italics.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus


attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# haconf -makerw


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hagrp -add db2group
VCS:10136:group added; populating SystemList and setting the Parallel attribute
recommended before adding resources
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hagrp -modify db2group SystemList sun-ha1 1
sun-ha2 2
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hares -add db2resource DB2UDB db2group
VCS:10245:Resource added
NameRule and Enabled attributes must be set before agent monitors

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hares -modify db2resource Enabled 1


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2group sun-ha1 OFFLINE
db2group sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2resource sun-ha1 OFFLINE
db2resource sun-ha2 OFFLINE

The following properties must be modified.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hares -modify db2resource instanceOwner


db2inst1
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hares -modify db2resource instanceHome \
/export/home/db2inst1
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hares -modify db2resource nodeNumber 0
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hares -modify db2resource monitorLevel 0
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# haconf –dump -makero
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING

IBM Corporation 11 April 1, 2002


sun-ha2 RUNNING
db2group sun-ha1 OFFLINE
db2group sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2resource sun-ha1 OFFLINE
db2resource sun-ha2 OFFLINE

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hagrp -online db2group -sys sun-ha1


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2group sun-ha1 STARTING OFFLINE
db2group sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2resource sun-ha1 OFFLINE
db2resource sun-ha2 OFFLINE
db2resource sun-ha1 WAITING FOR ONLINE
db2resource sun-ha1 ONLINE
db2group sun-ha1 ONLINE

You may verify for yourself that the appropriate db2 processes are started

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ps -fu db2inst1


UID PID PPID C STIME TTY TIME CMD
db2inst1 24927 24926 0 09:23:58 ? 0:00 db2sysc
db2inst1 24931 24928 0 09:23:58 ? 0:00 db2sysc
db2inst1 24929 24927 0 09:23:58 ? 0:00 db2sysc
db2inst1 24930 24928 0 09:23:58 ? 0:00 db2sysc

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hagrp -switch db2group -to sun-ha2


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2group sun-ha1 OFFLINE
db2group sun-ha2 STARTING OFFLINE
-------------------------------------------------------------------------
db2resource sun-ha1 OFFLINE
db2resource sun-ha2 OFFLINE
db2resource sun-ha2 WAITING FOR ONLINE
db2resource sun-ha2 ONLINE
db2group sun-ha2 ONLINE

Verify that on sun-ha1, the processes are offlined

sun- ha1:root:/opt/IBMdb2/V7.1/ha/ut il# ps -fu db2inst1


UID PID PPID C STIME TTY TIME CMD

IBM Corporation 12 April 1, 2002


Verify that the processes are now onlined on sun-ha2
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# rsh sun-ha2 ps -fu db2inst1
UID PID PPID C STIME TTY TIME CMD
db2inst1 13436 13433 0 10:00:00 ? 0:00 db2sysc
db2inst1 13432 13431 0 10:00:00 ? 0:00 db2sysc
db2inst1 13434 13432 0 10:00:00 ? 0:00 db2sysc
db2inst1 13435 13433 0 10:00:00 ? 0:00 db2sysc

IBM Corporation 13 April 1, 2002


Switch the resource group back

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hagrp -switch db2group -to sun-ha1

And offline the resource completely

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hares -offline db2resource -sys sun-ha1


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2group sun-ha1 ONLINE
db2group sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2resource sun-ha1 ONLINE
db2resource sun-ha2 OFFLINE
db2resource sun-ha1 WAITING FOR OFFLINE
db2resource sun-ha1 OFFLINE
db2group sun-ha1 OFFLINE

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# rsh sun-ha2 ps -fu db2inst1


UID PID PPID C STIME TTY TIME CMD
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ps -fu db2inst1
UID PID PPID C STIME TTY TIME CMD

Remove the created resource and resource group.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# haconf -makerw


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hares -delete db2resource
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hagrp -delete db2group
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# haconf –dump -makero
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING

Note that this was the simplest example of the use of the DB2UDB type.
Generally, in addition to this, it is common to protect at least one
highly available IP address, and at least one or more highly available mount
points or disk groups (for database data storage). These are optional however,
depending on the requirements that the instance will be put to.

For example, no TCP/IP clients would imply that a highly available IP address is
not
required. No local databases would imply that no highly available mount points
or disk groups are required. What is always required, however, is to ensure

IBM Corporation 14 April 1, 2002


that the instance home directory is available. This can be achieved two ways:
first, the directory can be on a shared disk storage and failed over along with
the instance. Or, as in this example, we presume that the instance home
directory is not on a shared storage, but rather that there are two copies set
up to be available at all times, so that no failover of this mount point is
required.

Here is the main.cf file that corresponds to this example.

include "types.cf"
include "db2udb.type.cf"

cluster vcs_cs (
UserNames = { admin = "cDRpdxPmHpzS." }
CounterInterval = 5
)

system sun-ha1

system sun-ha2

group db2group (
SystemList = { sun-ha1 = 1, sun-ha2 }
)

DB2UDB db2resource (
instanceOwner = db2inst1
instanceHome = /export/home/db2inst1
)

// resource dependency tree


//
// group db2group
// {
// DB2UDB db2resource
// }

IBM Corporation 15 April 1, 2002


Example 2) Use of supplied scripts to automate registration tasks

There are in general a great deal of manual steps required to properly register the
db2 resources, groups and associated resources in the VCS cluster. To simplify
this procedure, scripts have been provided that automate much of this work. This
is an example that is similar to the preceding, however, it uses the supplied
scripts to perform the registration, onlining, offlining and unregistration. Again,
as in the previous example, we presume that the instance home directory is
available on both systems at all times, and is set up identically for both systems.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus


attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING

You may wish to verify that DB2 can be stopped and started on both systems
prior to proceeding with the registration.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# su - db2inst1 -c db2start


SQL1063N DB2START processing was successful.
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# su - db2inst1 -c db2stop
SQL1064N DB2STOP processing was successful.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# su - db2inst1


sun- ha1:db2inst1:/export/home/db2inst1% rlogin sun-ha2
Last login: Fri Sep 21 15:03:03 from sun-ha1
Sun Microsystems Inc. SunOS 5.8 Generic February 2000
db2sun- ha2:db2inst1:/export/home/db2inst1% db2start
SQL1063N DB2START processing was successful.
sun- ha2:db2inst1:/export/home/db2inst1% db2stop
SQL1064N DB2STOP processing was successful.
sun- ha2:db2inst1:/export/home/db2inst1% exit
Connection closed.
sun- ha1:db2inst1:/export/home/db2inst1% exit

DB2 is ready to be made highly available. Execute that supplied registration


script, providing the name of the instance that you wish to be made highly
available as an argument to the script.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# regdb2udbvcs -a db2inst1


About to register db2inst1 with VCS
Registering DB2 instance db2inst1...

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus


attempting to connect....connected

IBM Corporation 16 April 1, 2002


group resource system message
--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2_db2inst1_0-rg sun-ha1 OFFLINE
db2_db2inst1_0-rg sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha1 OFFLINE
db2_db2inst1_0-rs sun-ha2 OFFLINE

Notice that the script has created the resource group, and resource for
instance. We are ready to bring the resources online. To perform this onlining:

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# onlinedb2udbvcs -a db2inst1


Onlining DB2 instance db2inst1...
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2_db2inst1_0-rg sun-ha1 STARTING PARTIAL
db2_db2inst1_0-rg sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha1 OFFLINE
db2_db2inst1_0-rs sun-ha2 OFFLINE
db2_db2inst1_0-rs sun-ha1 WAITING FOR ONLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha1 ONLINE
db2_db2inst1_0-rg sun-ha1 ONLINE

You may wish to verify that the db2 processes are active as follows:

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ps -fu db2inst1


UID PID PPID C STIME TTY TIME CMD
db2inst1 25653 25651 0 15:12:17 ? 0:00 db2sysc
db2inst1 25654 25651 0 15:12:17 ? 0:00 db2sysc
db2inst1 25650 25649 0 15:12:17 ? 0:00 db2sysc
db2inst1 25652 25650 0 15:12:17 ? 0:00 db2sysc

IBM Corporation 17 April 1, 2002


Let's test the failover capabilities of this resource group.
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hagrp -switch db2_db2inst1_0-rg -to sun-ha2
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2_db2inst1_0-rg sun-ha1 OFFLINE
db2_db2inst1_0-rg sun-ha2 STARTING PARTIAL
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha1 OFFLINE
db2_db2inst1_0-rs sun-ha2 OFFLINE
db2_db2inst1_0-rs sun-ha2 WAITING FOR ONLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha2 ONLINE
db2_db2inst1_0-rg sun-ha2 ONLINE

You may wish to verify that the db2 processes are active as follows:

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# rsh sun-ha2 "ps -fu db2inst1"


UID PID PPID C STIME TTY TIME CMD
db2inst1 22704 22702 0 15:08:03 ? 0:00 db2sysc
db2inst1 22701 22700 0 15:08:03 ? 0:00 db2sysc
db2inst1 22705 22702 0 15:08:03 ? 0:00 db2sysc
db2inst1 22703 22701 0 15:08:03 ? 0:00 db2sysc

Suppose we no longer desire to have this instance to be HA.


We can offline the instance, and remove it from VCS control, with the following:

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ./offlinedb2udbvcs -a db2inst1


Offlining DB2 instance db2inst1...

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus


attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2_db2inst1_0-rg sun-ha1 OFFLINE
db2_db2inst1_0-rg sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha1 OFFLINE
db2_db2inst1_0-rs sun-ha2 OFFLINE

IBM Corporation 18 April 1, 2002


To remove the resources and groups created by the regdb2udbvcs command,
you may use the following:

sun- ha1:root:/opt/IBMdb2/V7.1/ha/ut il# ./unregdb2udbvcs -a db2inst1


About to unregister db2inst1 with VCS
This will REMOVE all associated resources and resource groups
Please hit any key to continue, break to terminate:

Unregistering DB2 instance db2inst1...

Verify that the resources and groups have in fact been removed.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus


attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING

Example 3) More complex example with DB2UDB type

A more involved example of using the supplied scripts would involve adding a
highly available IP address, so that clients can access this instance both before
and after failover. In addition, we wish to associate a mount point with this
instance's resource. This mount point will be used both to protect the instance
home directory, and we will use it to contain any and all databases created under
this instance. Thus, everything that is required to make the instance with local
databases and remote TCP/IP clients HA will be performed in this example.

The regdb2udbvcs command has several options. Enter the command name
without options to get help.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ./regdb2udbvcs


Usage: regdb2udbvcs -a db2instance [- g resourceGroup] [-i ClusterHAHostIP] [-n NIC]
[-m mountPoint]

Enter the command name with the -h switch to get detailed help.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ./regdb2udbvcs –h


Usage: regdb2udbvcs -a db2instance [- g resourceGroup] [-i ClusterHAHostIP] [-n NIC]
[-m mountPoint]

Register DB2 and associated resources in an existing VCS Cluster


Example: regdb2udbvcs -a db2inst1 will make the instance db2inst1 highly available
Switches and options:

IBM Corporation 19 April 1, 2002


-a, db2 instance to be made HA, must already exist
-g, [optional] resource group to place resource into, default create new resource group
-i, [optional] IP to be made HA, clients will refer to this address to access instance
-n, [optional] NIC to host IP
-m, [optional] mount point to make HA, generally for instance home directory

We are ready to make the instance HA. We have decided on the highly available
IP address that we wish to use, and have decided on the NIC that will host this IP
address. We have decided to host the instance home directory on the /database
mount point. Ensure that this mount point is specified in the /etc/vfstab, and
that it is set to not automount at reboot time.

For reference, here is the appropriate entry for this system in this system's
/etc/vfstab.

/dev/vx/dsk/db2dg/db2vol /dev/vx/rdsk/db2dg/db2vol /database vxfs 3 no


-

The instance is registered as follows:

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# \
# ./regdb2udbvcs -a db2inst1 -i 9.26.49.103 -n hme0 -m /database
About to register db2inst1 with VCS
Registering DB2 instance db2inst1...

Let's examine the resources and groups created.


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2_db2inst1_0-rg sun-ha1 OFFLINE
db2_db2inst1_0-rg sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2dg sun-ha1 OFFLINE
db2dg sun-ha2 OFFLINE
db2vol sun-ha1 OFFLINE
db2vol sun-ha2 OFFLINE
db2_db2inst1_0-rg_home sun-ha1 OFFLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rg_home sun-ha2 OFFLINE
IP_9_26_49_103 sun-ha1 OFFLINE
IP_9_26_49_103 sun-ha2 OFFLINE
db2_db2inst1_0-rg_hme0 sun-ha1 ONLINE
db2_db2inst1_0-rg_hme0 sun-ha2 ONLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha1 OFFLINE
db2_db2inst1_0-rs sun-ha2 OFFLINE

IBM Corporation 20 April 1, 2002


Note that all resources created are placed in the same resource group.
This is to ensure that they are always associated on the same physical host.

Now online the instance,


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ./onlinedb2udbvcs -a db2inst1
Onlining DB2 instance db2inst1...

Examine the status of the resources and resource groups in the cluster.
sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2_db2inst1_0-rg sun-ha1 ONLINE
db2_db2inst1_0-rg sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2dg sun-ha1 ONLINE
db2dg sun-ha2 OFFLINE
db2vol sun-ha1 ONLINE
db2vol sun-ha2 OFFLINE
db2_db2inst1_0-rg_home sun-ha1 ONLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rg_home sun-ha2 OFFLINE
IP_9_26_49_103 sun-ha1 ONLINE
IP_9_26_49_103 sun-ha2 OFFLINE
db2_db2inst1_0-rg_hme0 sun-ha1 ONLINE
db2_db2inst1_0-rg_hme0 sun-ha2 ONLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha1 ONLINE
db2_db2inst1_0-rs sun-ha2 OFFLINE

Let's perform a simple test of the operation of this highly available instance.
Switch the containing resource group to the non-primary node.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hagrp -switch db2_db2inst1_0-rg -to sun-ha2


Now verify that the processes have migrated from the initial to the current
host.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ps -fu db2inst1


UID PID PPID C STIME TTY TIME CMD

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# rsh sun-ha2 "ps -fu db2inst1"


UID PID PPID C STIME TTY TIME CMD
db2inst1 24629 24628 0 16:12:42 ? 0:00 db2sysc
db2inst1 24633 24630 0 16:12:43 ? 0:00 db2sysc
db2inst1 24632 24630 0 16:12:43 ? 0:00 db2sysc
db2inst1 24631 24629 0 16:12:42 ? 0:00 db2sysc

IBM Corporation 21 April 1, 2002


Remove the resources from VCS control. This will allow us to take the instance
offline and online without intervention from the VCS cluster software.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ./offlinedb2udbvcs -a db2inst1


Offlining DB2 instance db2inst1...

Examine the state of the cluster.


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING
db2_db2inst1_0-rg sun-ha1 OFFLINE
db2_db2inst1_0-rg sun-ha2 OFFLINE
-------------------------------------------------------------------------
db2dg sun-ha1 OFFLINE
db2dg sun-ha2 OFFLINE
db2vol sun-ha1 OFFLINE
db2vol sun-ha2 OFFLINE
db2_db2inst1_0-rg_home sun-ha1 OFFLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rg_home sun-ha2 OFFLINE
IP_9_26_49_103 sun-ha1 OFFLINE
IP_9_26_49_103 sun-ha2 OFFLINE
db2_db2inst1_0-rg_hme0 sun-ha1 ONLINE
db2_db2inst1_0-rg_hme0 sun-ha2 ONLINE
-------------------------------------------------------------------------
db2_db2inst1_0-rs sun-ha1 OFFLINE
db2_db2inst1_0-rs sun-ha2 OFFLINE

Let's now use the unregdb2udbvcs script to remove this resource and all
associated resources and groups complete from the VCS cluster.

sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# ./unregdb2udbvcs -a db2inst1


About to unregister db2inst1 with VCS
This will REMOVE all associated resources and resource groups
Please hit any key to continue, break to terminate:

Unregistering DB2 instance db2inst1...


sun- ha1:root:/opt/IBMdb2/V7.1/ha/util# hastatus
attempting to connect....connected

group resource system message


--------------- -------------------- --------------- --------------------
sun-ha1 RUNNING
sun-ha2 RUNNING

IBM Corporation 22 April 1, 2002


Cluster Verification Testing
After installing and configuring a VERITAS Cluster Server configuration, it is important to test it to
confirm proper operation. The purpose of this testing is to verify that the highly available
configuration will perform as expected in various failure scenarios. Included in this section is a
set of 5 verification tests that should be performed at a minimum to verify proper VCS operation,
and the initial VCS cluster configuration assumed for the purposes of these tests is the cluster
configuration that was configured in the first example presented in this paper (on page 9). Note
that to ensure that the cluster continues to function as expected, it is recommended that these
tests be run on a regular basis. The schedule will vary depending on production schedules, the
degree to which the cluster state evolves over time and management diligence.

Cluster Verification Test 1


This test performs VERITAS Cluster Server management commands to ensure that the DB2 UDB
instance can be controlled correctly. Assume in this case that the instance name is db2inst1.

First, verify that the DB2 UDB instance is accessible locally and that database commands, such
as create database, complete successfully. Take the db2_db2inst1_0-rs resource
offline, and take the IP resource offline with the following command:

hagrp –offline db2_db2inst1_0-rg

Observe that the db2inst1 instance resources no longer exist on any node in the cluster (for
partition 0) and that the highly available hostname/IP address is likewise inaccessible. From the
perspective of a client of this DB2 UDB instance, the existing DB2 UDB connections are closed
(forced off) through offlining the instance, and new DB2 UDB connections are started up with the
onlining of the appropriate DB2 UDB and IP resources.

Cluster Verification Test 2


To return the VCS-managed resources to their previous states, bring them online with the
following command:

hagrp –online db2_db2inst1_0-rg

DB2 UDB clients waiting from the previous test reconnect and can resubmit transactions that
were in-flight and did not get committed (if there were any) when the DB2 UDB instance was
offlined in Cluster Verification Test 1.

Cluster Verification Test 3


Test the manual failover of the DB2 UDB instance and associated resources from one node in the
cluster (called sun-ha1 in this example) onto another node (called sun-ha2). Prior to beginning
this test, ensure that the VCS cluster is brought back into its initial state.

Bring the resources contained within the resource group db2_db2inst1_0-rg offline via the
commands described in Cluster Verification Test 1 above. Then move the resource group over to
the other node using the following VCS command:

hagrp –online db2_db2inst1_0-rg –sys sun-ha2

Now attempt to enable the relevant resources using the same commands described in Cluster
Verification Test 2 above. The DB2 UDB resource for db2inst1 (partition 0) and the associated
hostname/IP address should now be hosted by the second node (sun-ha2). Verify that this is true
be executing the hastatus command.

Cluster Verification Test 4

IBM Corporation 23 April 1, 2002


This test verifies the failover capabilities of the VERITAS Cluster Server software. Bring the
resources back into their initial state with db2_db2inst1_0-rg hosted on sun-ha1.

Once the DB2 UDB instance and its associated resources are hosted on sun-ha1, perform a
power off operation on sun-ha1. This will cause the internal heartbeat mechanism to detect a
physical node failure and should migrate the DB2 UDB resources to the other node in the cluster.

Verify that the results are identical to those in Cluster Verification Test 3 above. The DB2 UDB
resources should be hosted on sun-ha2 and the clients should behave similarly in both cases.

Cluster Verification Test 5


This test verifies that the software application (DB2 UDB) is being monitored correctly. Bring the
cluster back into its initial state, and issue the following commands:

ps –ef | grep db2sysc


kill –9 <pid>

or

ps –ef | grep db2tcpcm


kill –9 <pid>

Through the VCS HA-DB2 agent, VERITAS Cluster Server should detect that a required process
is not running and attempt to restart the DB2 UDB instance on the same node. This should result
in the clean process being called, followed by the db2start process. Verify that this in fact
does occur. The client connections should experience a brief delay in service while the DB2 UDB
instance is being restarted.

Note that there are a large number of distinct testing scenarios that can be executed. The above
sets of Cluster Verification Tests are meant as a minimum that should be run to test the correct
functioning of the cluster. If there are particular failure scenarios for which protection must be
ensured, it is recommended that recovery from those failures is verified after VERITAS Cluster
Server installation and before the high availability configuration is put into production use for the
first time.

Discussion
Minimizing Failover Time
To reduce the DB2 UDB service interruption time incurred as part of failover, it is important to
understand the discrete tasks that comprise failover. Those are:

• Time for VCS to detect and respond to a failure


• Time for DB2 UDB to recover from a failure (DB2 UDB restart time)
• Time for TCP/IP clients to detect hostname failover and reconnect

Time To Detect A Failure


Adjusting the test intervals and timeout parameters of certain VCS monitors can shorten failure
detection time, but it is important to be aware that as these intervals are shortened, the chance of
detecting a failure when one does not actually exist (a false positive) increases. More frequent
testing also slightly increases load on the node. If the timer intervals are shortened sufficiently, a
service that is slow to respond may be detected as a failed service and failover induced
accordingly. Understanding typical behavior in your environment when an HA configuration is not
installed is important in tuning an HA configuration for the desired performance.

Time For DB2 UDB To Recover

IBM Corporation 24 April 1, 2002


There are parameters within DB2 UDB that can be reset to affect data service recovery time. The
typical techniques that are used for this include:

• A reduction in the database configuration parameter softmax


• An increase in the number of configured page cleaners
• A decrease in the “buffer pool threshold to trigger page cleaners” parameter
• Applications designed such that database commits are relatively more frequent

Note that changing any of these parameters may add additional load to the primary node (and
possibly the clients). In general, performing the actions above will trade off slightly larger load
(incurred by DB2 UDB) for faster DB2 UDB recovery. Discuss these issues with customers so
that they understand the trade-offs associated with modification of these parameters.

Client Reconnect
The time it takes before a client times out (when the DB2 UDB connection is lost) and attempts to
reconnect can be minimized. By attempting to reconnect relatively sooner after a failover has
occurred, total recovery time as perceived by the client can be reduced. Setting the TCP timeout
interval to 10 seconds can be done using the following command:

ndd –set /dev/tcp tcp_ip_abort_cinterval 10000

The Use Of AutoRestart for Persistent Resources


If a persistent resource, such as a network interface controller (NIC), in a resource group faults,
the resource group is automatically failed over to another node in the cluster provided that the
autofailover attribute is set and there is another node in the cluster to which the resource
group can be failed. If neither of these two conditions is met (the autofailover attribute is not
set or other nodes in the cluster are not available), the resource group remains offline and faulted,
even after the faulted resource comes back online.

Setting the autorestart attribute enables a resource group to be brought back online without
manual intervention. Assume a resource group with a NIC and several other resources. The NIC
fails but the other resources do not. The NIC failure will bring the entire resource group offline.
Setting the autorestart attribute for this resource group would enable VCS to bring the group
back online on the same node once the NIC failure is resolved. Not setting the autorestart
attribute for this resource group would mean that the resource group must be manually brought
back online once the NIC failure is resolved. Note that the autorestart attribute pertains only
to persistent resources.

The use of autorestart can be used to simplify the initial bringup of the application service. In
some cases, when a node boots and VCS starts, VCS probes all resources on the node to
determine their status. It is possible that when VCS probes the NIC resource, the resource may
not yet be online because the networking is not up and fully operational. When this occurs, VCS
will mark the NIC resource as faulted and will not bring its associated resource group online.
However, when the NIC resource comes online, and the autorestart attribute is enabled, the
resource group is brought online without manual intervention.

Client Issues
Applications that are relying on a highly available DB2 UDB instance must be able to reconnect in
the event that a failover occurs. Since the hostname and IP address of a logical host remain the
same, there is no need to connect to a different hostname or recatalog the database.

Consider a cluster with two nodes and one DB2 UDB EE instance. The DB2 UDB instance will
normally reside on one of the nodes in the cluster. Clients of this highly available data service will
connect to the logical IP address (or hostname) of the logical host associated with the DB2 UDB
instance. If DB2 UDB EEE instances are configured, then a unique node number is associated

IBM Corporation 25 April 1, 2002


with each DB2 UDB EEE instance. Because multiple DB2 UDB EEE instances can be configured
on a single node in the cluster (and often are), the node numbers are used to differentiate
between instances for connection, administrative and recovery purposes.

From the point of view of a client of a highly available DB2 UDB instance, there are two types of
failovers. One type occurs if the node hosting the DB2 UDB instance crashes, causing a failover
to occur to another node in the cluster. For this type of failover the DB2 UDB instance is not
expected to have closed down gracefully, and upon recovery of the instance DB2 UDB will roll
forward any completed transactions and roll back any transactions that were in flight at the time of
failure. This database recovery will increase the time it takes to migrate the DB2 UDB data
service from one node to another.

If a node crashes and takes down the DB2 UDB instance, both existing and new client
connections to the database will hang. The connections hang because there are no nodes on the
network that have the IP address that the clients were using for the database. During the failover,
the logical IP address(es) associated with the database is offline either because the node that
was hosting the logical host crashed or because VCS software took it offline (the result of the
clean script being called in the event of a process hang or other software-based disruption to the
DB2 UDB data service). Once the DB2 UDB instance is fully recovered on another cluster node
clients will be able to reconnect.

The other type of failover is called a manual failover. In this case, DB2 UDB is gracefully shut
down using DB2 UDB commands (issued by the DB2 HA agent scripts). The db2stop used to
shut DB2 UDB down gracefully forces off all client connections to the database. The clients will
receive an error because they were forced off. Once the shutdown is complete, the DB2 UDB
instance can be brought back online on another node faster than in the event of a crash. That is
because rollforward and rollback does not occur during the recovery process (they were
effectively dealt with during the graceful shutdown of the DB2 UDB instance).

In clusters with more than two nodes and more than two primaries identified, the systemlist
attribute will specify the order in which the backup nodes take over for the primary. For example,
a 3 node cluster might be configured so that if node A fails, node B always takes over for it, and if
node B fails, then node C takes over for it. In this case, node C would not take any services over
directly from node A. Other methods, such as node workload, can also be used to determine
which node takes over from a failed node. Refer to the VERITAS Cluster Server 2.0 Users Guide
for Solaris for additional information.

One or more logical IP addresses can be associated with a DB2 UDB instance. When bringing a
DB2 UDB instance up on another node, regardless of whether this is caused by a crash or a
manual failover, the IP addresses will come online before the database has been restarted. If
clients try to connect to the database once the IP addresses have come up but before the
database has been started, they will receive a communication error. DB2 clients that are still
connected to the database will also begin receiving communication errors during this period.
Although these clients still believe they are connected, the new node on which the database is
being brought up has no knowledge of any existing connections. The connections are simply
reset and the DB2 client receives a communication error. After a short time, the DB2 UDB
instance will be restarted on the node and a connection to the database will succeed. At this
point, the database may be inconsistent (require rollforward and rollback as part of the recovery)
and the clients may have to wait for it to be properly recovered. This recovery occurs
automatically as part of the DB2 UDB restart process.

While this may seem to be a complicated and time-consuming series of events, it is actually quite
simple. When designing an application for a high availability environment, it is not necessary to
write special code for the periods during which the database connections may hang. The
connections only hang for a short period of time while the VCS software moves the logical IP
address(es). Any application service running on a VCS cluster will experience the same hanging

IBM Corporation 26 April 1, 2002


connections during this period. No matter how the database comes down, the clients will receive
an error and will simply need to try to reconnect to the database until the connection succeeds.
From the DB2 client’s point of view, it is similar to the situation that would occur if the DB2 UDB
instance went down and was then brought back up on the same node. In a manual failover, it
would appear to the client that it was forced off and at some time later was able to reconnect to
the database on the same node. An abrupt and uncontrolled failover (due to, for example, a
system crash) would appear to the client as a database crash that was soon brought back up on
the same node.

Client Issues: The Use of TCP Keep-Alives


On the server side, using TCP keep-alives trims the overhead from a node that might otherwise
end up unnecessarily using system resources for a down (or network-partitioned) client. If those
resources are not periodically trimmed they can grow without bounds as clients crash and reboot
over time. If a node stays up long enough without a reboot, the overhead incurred without the
use of TCP keep-alives may seriously impact the performance of the cluster node.

On the client side, using TCP keep-alives enables the client to be notified when a network
address resource has failed over or been manually failed over from one physical host to another.
That transfer of the network address resource breaks the TCP connection. Unless the client has
enabled the TCP keep-alive, it would not necessarily learn of the connection break if the
connection does not happen to be in use at the time.

In Solaris, the tcp_keepalive feature generates a series of probe messages from the host to
the client. The host sends messages at specified intervals to all clients. If no response returns,
the host assumes that the client is no longer active and closes the connection. The default
interval for probe generation in Solaris is 7.2 million milliseconds (2 hours). The
tcp_keepalive_interval can be modified through the ndd (1M) command. In order for the
interval’s value to be set every time when the node boots, an entry can be made in the
/etc/init.d/inetinit script. The entry to set the tcp_keepalive_interval to one hour
is:

/usr/sbin/ndd –set /dev/tcp tcp_keepalive_interval 3600000

It is not recommended that the value of the tcp_keepalive_interval be set below 15


minutes.

Client Issues: Client Retry


To ensure that recovery of the DB2 UDB data service is fully automated, it is recommended that
the client applications be configured to periodically check the connection status and to attempt to
reconnect if they find the connection has failed. Applications can also be implemented to notify
the user that a long retry is in progress, offering the user the option to continue or cancel.

Summary
We have discussed the design, installation, configuration and verification of a highly available
DB2 UDB data service in a VERITAS Cluster Server environment. This combination produces an
outstanding relational database management solution with a high degree of availability, reliability
and performance.

IBM Corporation 27 April 1, 2002


Appendix A

Sample main.cf with DB2 resource group containing separate tablespace and
database log volumes with associated mount points.

include "types.cf"
include "db2udb.type.cf"

cluster vcs_cs (
UserNames = { admin = "cDRpdxPmHpzS." }
Administrators = { admin }
CounterInterval = 5
)

system sun-ha1 (
)

system sun-ha2 (
)

group db2_grp (
SystemList = { sun-ha1 = 1, sun-ha2 = 0 }
AutoStartList = { sun-ha2 }
)

DB2UDB db2_db2inst1 (
probeDatabase = sample
instanceOwner = db2inst1
instanceHome = "/u01/export/home/db2inst1"
probeTable = sample
nodeNumber = 32768
)

DiskGroup logdg (
DiskGroup = logdg
)

DiskGroup tabledg (
DiskGroup = tabledg
)

IP db2_fip (
Device = qfe2
Address = "9.26.49.103"
IfconfigTwice = 1
)

Mount db2_u01 (
MountPoint = "/u01"
BlockDevice = "/dev/vx/dsk/tabledg/tablevol"
FSType = vxfs
MountOpt = rw
)

Mount db2_u02 (

IBM Corporation 1 April 1, 2002


MountPoint = "/u02"
BlockDevice = "/dev/vx/dsk/logdg/logvol"
FSType = vxfs
MountOpt = rw
)

NIC db2_qfe2 (
Device = qfe2
NetworkType = ether
)

Volume logvol (
Volume = logvol
DiskGroup = logdg
)

Volume tablevol (
Volume = tablevol
DiskGroup = tabledg
)

db2_db2inst1 requires db2_fip


db2_db2inst1 requires db2_u01
db2_db2inst1 requires db2_u02
db2_fip requires db2_qfe2
db2_u01 requires tablevol
db2_u02 requires logvol
logvol requires logdg
tablevol requires tabledg

// resource dependency tree


//
// group db2_grp
// {
// DB2UDB db2_db2inst1
// {
// IP db2_fip
// {
// NIC db2_qfe2
// }
// Mount db2_u01
// {
// Volume tablevol
// {
// DiskGroup tabledg
// }
// }
// Mount db2_u02
// {
// Volume logvol
// {
// DiskGroup logdg
// }
// }
// }
// }

IBM Corporation 2 April 1, 2002

You might also like