Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

PowerHA 7.

2 Configuration Guide for


AIX

Server Services @ AXP

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Table of Contents

Contents
Contents.................................................................................................................................................. 2
1) Pre-requisites verification and installation............................................................................................ 7
2) PowerHA Removal............................................................................................................................... 7
3) cfgscsi_id Configuration....................................................................................................................... 9
3.1) Create Event.................................................................................................................................... 9
3.2) Validate Event.................................................................................................................................. 9
3.3) Update node_up Event..................................................................................................................... 9
3.4) Validate node_up............................................................................................................................. 9
3.5) Update node_down Event.............................................................................................................. 10
3.6) Validate node_down....................................................................................................................... 10
4) /etc/hosts Update [Multi-node Configuration Section].........................................................................11
4.1) Sample Configuration.................................................................................................................... 11
5) /etc/cluster/rhosts Update [Multi-node Configuration Section]............................................................11
5.1) Sample Configuration.................................................................................................................... 11
6) Host Connectivity Test [Multi-node Configuration Section]................................................................11
7) Multicast Connectivity Test [Multi-node Configuration Section]........................................................12
8) Create Cluster..................................................................................................................................... 12
8.1) Configuration options.................................................................................................................... 12
8.2) Sample Configuration Selections................................................................................................... 12
8.3) Validation...................................................................................................................................... 12
8.4) Set the non-primary networks to “private”..................................................................................... 13
9) Disable ROOTVG failure detection.................................................................................................... 13
10) Perform initial cluster synchronization.............................................................................................. 14
11) Configure netmon.cf [Multi-node Configuration Section].................................................................15
11.1) Example....................................................................................................................................... 15
11.2) Validation..................................................................................................................................... 15
12) Define Service IPs............................................................................................................................ 16
12.1) Configuration Selections.............................................................................................................. 16
12.2) Sample Configuration Selections................................................................................................. 16
12.3) Validation..................................................................................................................................... 16
13) Define Each Resource Group............................................................................................................ 16
13.1) Configuration Selections.............................................................................................................. 17
13.2) Sample Configuration Selections................................................................................................. 17
13.3) Additional HADR group dependency configuration.....................................................................17
13.4) Sample HADR group dependency configuration Selections.........................................................17
13.5) Validation..................................................................................................................................... 17
14.1) Prerequisite [Multi-node Configuration Section]..........................................................................18
14.2) PowerHA Menu........................................................................................................................... 19
14.3) Configuration Selections.............................................................................................................. 19
14.4) Sample Configuration Selections................................................................................................. 19
14.5) Validation..................................................................................................................................... 19
15) Create Shared Logical Volumes........................................................................................................ 21
15.1) PowerHA Menu........................................................................................................................... 21
15.2) Configuration Selections.............................................................................................................. 21
15.3) Sample Configuration Selections................................................................................................. 21
15.4) Validation..................................................................................................................................... 21
16) Create Shared Filesystems................................................................................................................ 22
16.1) PowerHA Menu........................................................................................................................... 22
16.2) Configuration Selections.............................................................................................................. 22
16.3) Sample Configuration Selections................................................................................................. 22

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
16.4) Validation.................................................................................................................................... 22
17) Syncronize the cluster....................................................................................................................... 22
18) Start Cluster Services........................................................................................................................ 23
18.1) Validation..................................................................................................................................... 23
19) Application Install............................................................................................................................ 23
20) Create an Application Server............................................................................................................ 23
20.1) Sample Configuration command (it’s a single line)......................................................................24
20.2) Validation.................................................................................................................................... 24
21) Configure Application Monitoring.................................................................................................... 25
21.1) Sample Configuration Commands (single line commands)...........................................................25
21.2) Validation..................................................................................................................................... 26
22) Finalize Resource Group................................................................................................................... 26
22.1) Sample Configuration command (single line command)..............................................................26
22.2) Validation.................................................................................................................................... 26
23) Synchronize HACMP Resources...................................................................................................... 27
24) Failover Resource Groups................................................................................................................. 27
24.1) Failover command....................................................................................................................... 27
24.2) Sample command......................................................................................................................... 27
24.3) Validation.................................................................................................................................... 27
25) Create a Cluster Snapshot................................................................................................................. 28
25.1) Create snapshot command............................................................................................................ 28
25.2) Configuration Selections.............................................................................................................. 28
25.3) Sample command......................................................................................................................... 28
25.4) Validation.................................................................................................................................... 28
1) Convert an Existing Volume Group..................................................................................................... 29
1.1) Prerequisite.................................................................................................................................... 29
1.2) Smit Menu..................................................................................................................................... 29
1.3) Configuration Selections................................................................................................................ 29
1.4) Sample Configuration Selections................................................................................................... 30
1.5) Prepare Volume Group on Primary Node.......................................................................................30
1.6) Import Volume Group on Secondary Node.....................................................................................30
1.7) PowerHA Menu............................................................................................................................. 30
1.8) Configuration command................................................................................................................ 30
1.9) Sample Configuration command.................................................................................................... 30
1.10) Validation..................................................................................................................................... 30

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Document Log
Summary of Changes
Revision Revision Editor Nature of Change
Date Number
02/10/17 1.0 Hector Aguirre Initial Release – based on PowerHA 7 document
v1.3

Document Review Plans


This document will be reviewed and updated as required to correct or enhance information content
with/without any change in build procedures.

Document Distribution
This document is automatically distributed to all document approvers and for future reference kept in
IBM QMX Database. Printed copies are for reference only and are not controlled.

How this Document Is Organized


This document is organized into the following sections:
1. The Document log provides information about this procedure and changes.
2. The Appendix provides supplemental detailed build information.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Overview
This document outlines the procedure for configuring IBM’s PowerHA clustering software for use within
the American Express environment. PowerHA provides high availability for applications through
utilizing various levels of hardware and software redundancy. This document should be used for AIX
servers only.

Prerequisites
1) Server must be racked, powered and cabled.
2) Operating system must be AIX 7.1 TL4 or 7.2 TL1 last SP possible
3) All required IPs and shared disk storage must be configured

Document Convention
For most of the configuration options a CLI command will be provided. For those that require the use of
smitty a table will be provided similar to below.
Fastpath smitty cm_config_nodes.add_dmn
PowerHA Initialization and Standard Configuration
Menu  Configure an HACMP Cluster and Nodes

The first option lists out the smitty fastpath to take you directly to this configuration option. The second
listing provides how you can navigate to this menu from the main “smitty hacmp” menu.

Most sections will then contain a “Configuration Selections” section that outlines the parameters that
need to be set for any given option. Fields that will need user input will be bolded blue text. These
sections will be followed by an example which shows the values that were used when setting up a lab
cluster. Finally, there will be a validation section that contains commands to verify that the intended
outcome has been achieved.

Concepts and Definitions


The top level object of the PowerHA hierarchy is a cluster. A cluster creates a relationship between two
or more nodes for the purpose of providing high availability for an application or service. A cluster is
made up of a topology and one or more resource groups.

Topology – The communication network used between the nodes within the cluster. This is used for the
nodes to communicate their health and resource group status. Typically this will be made up of TCP/IP
based networks such as standard IP based interfaces and non-TCP/IP based networks such as heartbeating
through shared disks.

PowerHA 7.2 topology differs from PowerHA 5 and from PowerHA 7.1. The disk heartbeat is replaced
with a repository disk and the network communicaction also changes from PowerHA 7.1, Multicast for
cluster communication is not anymore mandatory.

Resource Group – This is a combination of all of the elements required to run a given application.
Generally speaking this will consist of the shared storage and filesystems required for the application, any
required IPs (service IPs), the corresponding application server and monitors, and the application failover
behavior preferences.

Service IPs – Virtual IPs that are kept highly available between the nodes in a cluster.

Application Controller Script – Any start up and shut down processes that are required to bring an
application online and offline.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Application Monitor – The process required to evaluate the health of the application.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
PowerHA Software Installation
Pre-requisites verification and installation
To install PowerHA 7.2 we need AIX 7.1 TL 4 or AIX 7.2 TL1 with some additional filesets.So we need
to check the oslevel and validate the required filesets are in place.

“oslevel -s” should return 7100-04-04-1717 or higher for AIX 7.1 or 7200-01-02-1717 or higher for AIX
7.2. If it’s lower, apply the latest patch bundle.

The following filesets need to be installed and at the level corresponding to the “oslevel”:

For AIX 7.1


bos.ahafs 7.1.4.30 C F Aha File System
bos.cluster.rte 7.1.4.31 C F Cluster Aware AIX
rsct.opt.stackdump 3.2.1.10 C F RSCT Stackdump module

For AIX 7.2


bos.ahafs 7.2.1.1 C F Aha File System
bos.cluster.rte 7.2.1.1 C F Cluster Aware AIX
rsct.opt.stackdump 3.2.2.0 C F RSCT Stackdump module

2) Initial PowerHA Installation

The following sequence is used to install PowerHA.

# mount sysadmin:/wasmast/env_scripts /wasmast


# /wasmast/installPowerHA72.pl -update
User is logged in as root
++++Mounting remote NFS filesystem++++
++++Checking for sufficient space in /opt++++
++++Checking for PowerHA Software++++
++++Installing PowerHA Software++++
Successful base install
Successful SP installation
++++Installation completed successfully!++++
++++Unmounting remote NFS filesystem++++
# umount /wasmast

PowerHA Removal
If you need to remove PowerHA execute the following commands.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
# mount sysadmin:/wasmast/env_scripts /wasmast
# /wasmast/installPowerHA72.pl -backout
User is logged in as root
++++Checking for PowerHA Software++++
++++Mounting remote NFS filesystem++++
++++Removing PowerHA Software++++
Successful uninstall
++++Backout completed successfully!++++
++++Unmounting remote NFS filesystem++++
# umount /wasmast

After running the script in backout mode on both nodes, do this on one of the nodes:

# lscluster -c
# rmcluster –n <cluster name>

And then reboot both servers.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
PowerHA Cluster Configuration
The sections below begin the process of defining the PowerHA cluster configuration. The configuration
process takes place on only one node within the cluster. There are several points in the process defined
below where cluster configuration is synchronized between the nodes. Actions that require updates on
the secondary nodes (such as removing reserve locks) are annotated with [Multi-node Configuration
Section] in the header.

EMC Cluster Configuration


cfgscsi_id Configuration
Systems that have direct attached EMC Clariion devices must configure the cfgscsi_id tool, including
NPIV deployments. This is not applicable to systems that have VSCSI Clariion devices presented
through a VIO Server. This is only required when EMC Clariion devices are part of the resources that
will be configured as shared resources within the cluster.

3.1) Create Event


Run the following command to create the cfgscsi_id event:
/usr/es/sbin/cluster/utilities/claddcustom -t event -n 'cfgscsi_id' -I 'Set correct scsi id on EMC CLARiiON
pseudo devices.' -v '/usr/sbin/cfgscsi_id'

3.2) Validate Event


Ensure that your event has been created by running the following command:
/usr/bin/odmget -q name=cfgscsi_id HACMPcustom

Your results should match the following output:


HACMPcustom:
name = "cfgscsi_id"
type = "event"
description = "Set correct scsi id on EMC CLARiiON pseudo devices."
value = "/usr/sbin/cfgscsi_id"
relation = ""
status = 0

3.3) Update node_up Event


Run the following commands to make the newly created cfgscsi_id a pre-event for node_up:
/usr/es/sbin/cluster/utilities/clchevent -O'node_up' -s'/usr/es/sbin/cluster/events/node_up' -b 'cfgscsi_id' -c
'0'

3.4) Validate node_up


Run the following command to valid your update has succeeded:
/usr/bin/odmget -q name=node_up HACMPevent

Your results should match the following output:


HACMPevent:
name = "node_up"
desc = "Script run when a node is attempting to join the cluster."
setno = 101
msgno = 7
catalog = "events.cat"
cmd = "/usr/es/sbin/cluster/events/node_up"

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
notify = ""
pre = "cfgscsi_id"
post = ""
recv = ""
count = 0
event_duration = 0

3.5) Update node_down Event


Run the following commands to make the newly created cfgscsi_id a pre-event for node_up:
/usr/es/sbin/cluster/utilities/clchevent -O'node_down' -s'/usr/es/sbin/cluster/events/node_down' -b
'cfgscsi_id' -c '0'

3.6) Validate node_down


Run the following command to valid your update has succeeded:
/usr/bin/odmget -q name=node_down HACMPevent

Your results should match the following output:


HACMPevent:
name = "node_down"
desc = "Script run when a node is attempting to leave the cluster."
setno = 101
msgno = 8
catalog = "events.cat"
cmd = "/usr/es/sbin/cluster/events/node_down"
notify = ""
pre = "cfgscsi_id"
post = ""
recv = ""
count = 0
event_duration = 0

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
PowerHA Cluster Configuration
/etc/hosts Update [Multi-node Configuration Section]
The /etc/hosts file is the source for PowerHA host related resolution. You must first update the /etc/hosts
file to ensure that all aliases and IPs are listed and are identical on all nodes. Specify both the short name
and the fully qualified name for each IP entered. PowerHA does not use DNS server resolution.

4.1) Sample Configuration


127.0.0.1 loopback localhost # loopback (lo0) name/address
10.22.84.53 avlmd510 avlmd510.ipc.us.aexp.com
10.22.84.61 avlmd511 avlmd511.ipc.us.aexp.com
10.29.12.197 bu-avlmd510.phx.aexp.com bu-avlmd510
10.29.12.57 bu-avlmd511.phx.aexp.com bu-avlmd511
192.168.34.10 gpfs_avlmd510 avlmd510_hadr
192.168.34.11 gpfs_avlmd511 avlmd511_hadr
10.22.84.66 pddd013 pddd013.phx.aexp.com
10.22.84.93 pddd013sb pddd013sb.phx.aexp.com

/etc/cluster/rhosts Update [Multi-node Configuration Section]


The /etc/cluster/rhosts file specifies which IPs will be used for cluster communication.
Create this file, update it and copy it to all nodes.
List ONLY aliases representing permanent IPs (primary, BEN, HADR) and do NOT list VIPs in this file.

After copying this file to all nodes, run “refresh –s clcomd” on all of them.

5.1) Sample Configuration


avlmd510
avlmd511
bu-avlmd510
bu-avlmd511
avlmd510_hadr
avlmd511_hadr

Host Connectivity Test [Multi-node Configuration Section]


For each host entry that you created (excluding service IPs), ensure that you are able to ping from each
client. The following for loop can be used to test all IPs (any service IPs will report failed):
for i in `awk '$1 ~ /^[1-9]/ {print $1}' /etc/hosts`
do
ping -c 2 -w 2 $i >/dev/null 2>&1
if [ $? -eq 0 ]; then
echo Ping to $i succeeded
else
echo Ping to $i failed
fi
done

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Multicast Connectivity Test [Multi-node Configuration Section]
PowerHA 7.2 does not require a multicast IP and a multicast enabled on the network.
PowerHA 7.2 can be setup to use unicast.

If you decide to use unicast instead of multicast continue with next point.

To validate multicast is working we need to rung the "mping" command as receiver on one node and
sender on another.
On one node run:
mping -v -r -c 5 -a <xxx.xxx.xxx.xxx>

and then on another node run:


mping -v -s -c 5 -a <xxx.xxx.xxx.xxx>

where <xxx.xxx.xxx.xxx> is the multicast IP assigned for the cluster.


If multicast is working properly the "ping" will succeed (the first packet may get dropped).

Create Cluster
We will now create the cluster using the new command line interface.

8.1) Configuration options


 Cluster Name: For your cluster name, use cl_<host1>_<host2>. If you have more than 2 hosts,
append the other hosts in the same format.
 Cluster IP: The multicast IP assigned for this cluster only if Multicast will be used.
 Repository disk: The designated LUN for the repository disk. Must be shared and 1GB or more in
size.

IMPORTANT: The repository disk along with any other shared disk MUST have the reserve_policy
changed to “no_reserve” before they are used in any cluster configuration.

8.2) Sample Configuration Selections


Unicast:
# clmgr add cluster cl_avlmd510_avlmd511 repository=hdiskpower13 nodes=avlmd510,avlmd511

Multicast:
# clmgr add cluster cl_avlmd510_avlmd511 repository=hdiskpower13 nodes=avlmd510,avlmd511
CLUSTER_IP=239.192.0.105

8.3) Validation
After the command has completed, issue the command /usr/es/sbin/cluster/utilities/cltopinfo and validate
that the cluster configuration process has properly identified all of the common networks you defined
within the /etc/hosts file.

# /usr/es/sbin/cluster/utilities/cltopinfo
Cluster Name: cl_avlmd510_avlmd511
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Use Persistent Labels for Communication: No
Repository Disk: hdiskpower13
Cluster IP Address: 239.192.0.105
There are 2 node(s) and 3 network(s) defined
NODE avlmd510:
Network net_ether_01
avlmd510 10.22.84.53
Network net_ether_02
bu-avlmd510 10.29.12.197
Network net_ether_03
gpfs_avlmd510 192.168.34.10
NODE avlmd511:
Network net_ether_01
avlmd511 10.22.84.61
Network net_ether_02
bu-avlmd511 10.29.12.57
Network net_ether_03
gpfs_avlmd511 192.168.34.11

No resource groups defined

8.4) Set the non-primary networks to “private”


If Multicast is setup then Multicast is enabled on the primary network only.
We need to define the other networks as “private” to prevent multicast traffic on them.
This will also add the interfaces to the /etc/cluster/ifrestrict file.

For example, in the cluster shown above we would need to define net_ether_02 and 03 as private:

# clmgr modify network net_ether_02 PUBLIC=private


# clmgr modify network net_ether_03 PUBLIC=private

Disable ROOTVG failure detection


Rootvg failure detection is not supported with EMC boot disks and we must disable it.

Fastpath smitty cm_change_show_sys_event


select ROOTVG
PowerHA Custom Cluster Configuration
Menu  Events
 System Events
 Change/Show Event Response
 ROOTVG

Change “Active” to “no” as shown below:

* Event Name ROOTVG +


* Response Log event and reboot +
* Active No +

WARNING: After this change the system logs may start to get a lot of false alerts about rootvg.
This is a known issue and at the time of this writing it’s being investigated under PMR: 55508,227,000
After performing the next step (cluster sync) you must immediately reboot all nodes to prevent the log
file to grow and fill the filesystem.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Keep an eye on log file /var/adm/syslogs/system.debug and if you see errors like “kern:debug unix:
ROOTVG” being repeated all the time report the issue immediately.

Perform initial cluster synchronization


In order to create the repository disk structure and setup cluster communication we need to synchronize
the cluster. This procedure will also validate multicast communication if setup.

# clmgr sync cluster


Reboot all nodes after the “sync” finishes successfully.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Configure netmon.cf [Multi-node Configuration Section]
There is currently a limitation within PowerHA when using virtual ethernet cards, or logical host ethernet
adapters. In short, the hypervisor sends traffic to these interfaces which can introduce scenarios where
PowerHA cannot determine whether an interface is properly functioning. Full details can be found at
http://www-01.ibm.com/support/docview.wss?uid=isg1IZ01331. To resolve this issue we will configure
a series of ping addresses within the netmon.conf file.

Edit the file /usr/es/sbin/cluster/netmon.cf on all nodes with pingable addresses in the following format:
!REQD <interface as enX> <ping address>

The ping addresses to use for the public interface are as follows for both Phoenix and DR:
<Your default router>
148.173.250.27 # DNS server
148.173.250.201 # DNS server
148.173.251.90 # NIM Server

For IPC2 use the default GW and the IPC2’s DNS and NIM servers.

The ping address to use for the BEN network is as follows:


For Phoenix (legacy only, not for G1):
10.74.248.1 # BEN admin/utility router
10.74.250.1 # BEN admin/utility router
10.74.248.55 # NIM Server

For DR:
10.10.40.1 # BEN admin/utility router
10.10.40.87 # NIM Server

For G1 and IPC2:


Use the assigned TSM server IP and the vlan’s gateway.

11.1) Example
# cat /usr/es/sbin/cluster/netmon.cf
!REQD en0 148.171.94.1
!REQD en0 148.173.250.27
!REQD en0 148.173.250.201
!REQD en0 148.173.251.90
!REQD en1 10.74.248.55
!REQD en1 10.74.248.1
!REQD en1 10.74.250.1

11.2) Validation
Ensure you can ping all of the addresses you have selected for your netmon.cf file. The following for
loop can be used as an easy test:
for i in `awk '{print $3}' /usr/es/sbin/cluster/netmon.cf`
do
ping -c 2 -w 2 $i >/dev/null 2>&1
if [ $? -eq 0 ]; then
echo Ping to $i succeeded
else
echo Ping to $i failed
fi

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
done

Define Service IPs


You must perform this step for each IP address that will be failed over as part of a resource group.

12.1) Configuration Selections


 IP Label/Address: The IP label you want to configure. This again relies upon having properly
configured the /etc/hosts file.
 Network Name: The network that this IP belongs to. This will be used in determining which
network/interface to use when bringing this IP online. It’s usually the primary network and the name
can be seen in cltopinfo’s output.

12.2) Sample Configuration Selections


# clmgr add service_ip pddd013 network=net_ether_01

12.3) Validation
After the command has completed, issue the command /usr/es/sbin/cluster/utilities/cltopinfo and validate
that the service IPs have been associated with the appropriate networks.

# /usr/es/sbin/cluster/utilities/cltopinfo
Cluster Name: cl_avlmd510_avlmd511
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdiskpower13
Cluster IP Address: 239.192.0.105
There are 2 node(s) and 3 network(s) defined
NODE avlmd510:
Network net_ether_01
pddd013sb 10.22.84.93
pddd013 10.22.84.66
avlmd510 10.22.84.53
Network net_ether_02
bu-avlmd510 10.29.12.197
Network net_ether_03
gpfs_avlmd510 192.168.34.10
NODE avlmd511:
Network net_ether_01
pddd013sb 10.22.84.93
pddd013 10.22.84.66
avlmd511 10.22.84.61
Network net_ether_02
bu-avlmd511 10.29.12.57
Network net_ether_03
gpfs_avlmd511 192.168.34.11
No resource groups defined

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Define Each Resource Group


You are now defining a resource group for each application that needs to failover independently.

For DB2 HADR configuration you must create 2 resource groups for the same instance.

Only one HADR instance per cluster is supported with this procedure.

Command line
# clmgr add resource_group rg_name startup=OHN fallover=FNPN fallback=NFB nodes=node1,node2

13.1) Configuration Selections


 Resource Group Name (standard): Follow the naming convention of rg_<application name>
 Resource Group Name (HADR): Follow the naming convention of rg_<instance> and
rg_s_<instance> for the HADR group pair.
 Nodes: Type the comma separated node list. If you are defining multiple resource groups alternate
which node takes the highest priority.

13.2) Sample Configuration Selections


# clmgr add resource_group rg_pddd013 startup=OHN fallover=FNPN fallback=NFB
nodes=avlmd510,avlmd511
# clmgr add resource_group rg_s_pddd013 startup=OHN fallover=FNPN fallback=NFB
nodes=avlmd511,avlmd510

13.3) Additional HADR group dependency configuration

HADR requires a dependency between the RG to make sure they are not started on the same node.

 High Priority Resource Group(s): rg_<instance>


 Low Priority Resource Group(s): rg_s_<instance>

13.4) Sample HADR group dependency configuration Selections

# clmgr add dependency HIGH="rg_ pddd013" LOW="rg_s_ pddd013"

13.5) Validation
After the command has completed, issue the command /usr/es/sbin/cluster/utilities/cltopinfo and validate
that the new resource group is listed with the required settings.
For HADR also validate the dependency with this command: clmgr -v query dependency
type="DIFFERENT_NODES"

# /usr/es/sbin/cluster/utilities/cltopinfo
Cluster Name: cl_avlmd510_avlmd511
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdiskpower13

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Cluster IP Address: 239.192.0.105
There are 2 node(s) and 3 network(s) defined
NODE avlmd510:
Network net_ether_01
pddd013sb 10.22.84.93
pddd013 10.22.84.66
avlmd510 10.22.84.53
Network net_ether_02
bu-avlmd510 10.29.12.197
Network net_ether_03
gpfs_avlmd510 192.168.34.10
NODE avlmd511:
Network net_ether_01
pddd013sb 10.22.84.93
pddd013 10.22.84.66
avlmd511 10.22.84.61
Network net_ether_02
bu-avlmd511 10.29.12.57
Network net_ether_03
gpfs_avlmd511 192.168.34.11

Resource Group rg_pddd013


Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Participating Nodes avlmd510 avlmd511

Resource Group rg_s_pddd013


Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback
Participating Nodes avlmd511 avlmd510

# clmgr -v query dependency type="DIFFERENT_NODES"


NAME="rg_pddd013++rg_s_pddd013"
TYPE="DIFFERENT_NODES"
HIGH="rg_pddd013"
INTERMEDIATE=""
LOW="rg_s_pddd013"

Create Shared Volume Groups



Skip this step for DB2 HADR since there are no shared disks on HADR configuration.

You will now create a shared volume group for each application that needs to fail over. If your volume
group already exists refer to Appendix A instead for the process to convert an existing volume group.

14.1) Prerequisite [Multi-node Configuration Section]


Before you can create a shared volume group you must first configure your disks appropriately to ensure
they are ready to be shared between your systems. First you must identify the same disk/lun across your

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
systems. When using EMC devices, the command “powermt display dev=all” can be used to match the
“Logical device ID”. For other disk types, the following commands can be used to ensure you have
paired the right devices:
lsattr -El <disk> -a lun_id
lscfg -vl <disk> | grep “Serial Number”
odmget -q "attribute=unique_id and name=<disk>" CuAt

Once you have identified the devices that will be used in the shared volume group, run the following
commands to prepare your disks for PowerHA on all nodes in the cluster.
On node 1:
chdev -l <disk> -a pv=yes
chdev -l <disk> -a reserve_policy=no_reserve
On node 2 … N:
cfgmgr
chdev -l <disk> -a reserve_policy=no_reserve

Next we must identify a common major number to use for each volume group. As a standard numbering
convention we will use numbers starting at 100. The following command provides a list of available
major number ranges.

/usr/es/sbin/cluster/sbin/cl_nodecmd /usr/sbin/lvlstmajor

Example:
# /usr/es/sbin/cluster/sbin/cl_nodecmd /usr/sbin/lvlstmajor
aplmd501: 35..99,101...
aplmd502: 35..99,101...

In the above example, the numbers 35-99 are available and everything else starting at 101. Since 100 is
already taken we would select 101.

14.2) PowerHA Menu


Fastpath smitty cl_createvg
PowerHA System Management (C-SPOC)
Menu  Storage
 Volume Groups
 Shared Volume Groups
 Create a Volume Group
A list will appear, select the following options:
Node Names: Select the nodes with F7
Physical Volume Names: Select the disks with F7
Volume Group Type: Scalable

14.3) Configuration Selections


 Resource Group Name: Press F4 and select the appropriate resource group
 VOLUME GROUP name: Enter the customer provided volume group name, or otherwise enter a
name in the format <application name>_vg
 Physical partition SIZE in megabytes: Choose a size appropriate for your total storage amount
 Volume group MAJOR NUMBER: Enter the major number selected above

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
14.4) Sample Configuration Selections
Node Names aplmd501,aplmd502
Resource Group Name [rg_udb3] +
PVID 00c4d532f5479504 00c4d532f548d36c
VOLUME GROUP name [udb3_vg]
Physical partition SIZE in megabytes 128 +
Volume group MAJOR NUMBER [101] #
Enable Fast Disk Takeover or Concurrent Access Fast Disk Takeover +
Volume Group Type Scalable

14.5) Validation
Running the command /usr/es/sbin/cluster/cspoc/cl_ls_shared_vgs will list out all shared volume groups.

# /usr/es/sbin/cluster/cspoc/cl_ls_shared_vgs
#Volume Group Resource Group Node List
caavg_private <None> aplmd501,aplmd502
udb3_vg rg_udb3 aplmd501,aplmd502

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Create Shared Logical Volumes
Skip this step for DB2 HADR since there are no shared disks on HADR configuration.

15.1) PowerHA Menu


Fastpath smitty cl_mklv
PowerHA System Management (C-SPOC)
Menu  Storage
 Logical Volumes
 Add a Logical Volume
A list will appear, select the following options:
Shared Volume Group Names: Select the appropriate volume group
Physical Volume Names: Choose “Auto-select” unless you have been given specific disk mapping
instructions

15.2) Configuration Selections


 Number of LOGICAL PARTITIONS: This value will be determined by the size of filesystem
needed, divided by the physical partition size set during the volume group creations.
 Logical volume NAME: Select a name that will help identify the filesystem used by this logical
volume and append the suffix _lv. The total characters cannot exceed 15.
 Logical volume TYPE: Select the corresponding filesystem type, in most cases this will be jfs2.

Configuration Alternative
If you do not want to calculate the number of logical partitions needed, you can use the command line
interface as follows:
/usr/sbin/cluster/cspoc/smitlvm -17 <resource group> -y <lv name> -t <filesystem type> <volume group>
<size: M for meg or G for gig>

Example:
/usr/sbin/cluster/cspoc/smitlvm -17 rg_udb3 -y pddd714_bp_lv -t jfs2 udb3_vg 8G

15.3) Sample Configuration Selections


Resource Group Name rg_udb3
VOLUME GROUP name udb3_vg
Node list aplmd501,aplmd502
Reference node aplmd501
* Number of LOGICAL PARTITIONS [48] #
PHYSICAL VOLUME names
Logical volume NAME [pddd714_bp_lv]
Logical volume TYPE [jfs2] +
POSITION on physical volume middle +
….

15.4) Validation
Running the command /usr/sbin/cluster/sbin/cl_lsfreelvs will list out the logical volumes that are not yet
associated with a filesystem.

# /usr/sbin/cluster/sbin/cl_lsfreelvs
pddd714_bp_lv aplmd501,aplmd502

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Create Shared Filesystems
Skip this step for DB2 HADR since there are no shared disks on HADR configuration.

16.1) PowerHA Menu


Fastpath smitty cl_mkfs
PowerHA System Management (C-SPOC)
Menu  Storage
 File Systems
 Add a File System
A list will appear, select the following options:
Volume Group: Select the appropriate volume group
Filesystem Type: Select Enhanced Journaled File System
Logical Volume: Select the appropriate logical volume

16.2) Configuration Selections


 MOUNT POINT: This is your filesystem’s mount point.
 Inline Log: Select whether the filesystem will use inline logging, or an external logging device

16.3) Sample Configuration Selections


Resource Group rg_udb3
Node Names aplmd501,aplmd502
Logical Volume name pddd714_bp_lv
Volume Group udb3_vg
* MOUNT POINT [/backup/pddd714] /
PERMISSIONS read/write +
Mount OPTIONS [] +
Block Size (bytes) 4096 +
Inline Log? yes +
Inline Log size (MBytes) [] #
Logical Volume for Log +
Extended Attribute Format Version 1 +
Enable Quota Management? no +

16.4) Validation
Running the command /usr/sbin/cluster/sbin/cl_lsfs will list out the known filesystems.

# /usr/sbin/cluster/sbin/cl_lsfs
Node: Name Nodename Mount Pt VFS Size Options Au to Accounting
aplmd501: /dev/pddd714_bkp_lv -- /backup/pddd714 jfs2 -- rw no no

Syncronize the cluster


Our network settings as well as volume group and filesystems are now configured. We will synchronize
the cluster resources before finalizing our configuration with any application servers.

# clmgr sync cluster

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Start Cluster Services
We need to start the services “now” (not at boot), with the clinfo daemon, automatically managing
resource groups and without broadcasting a message to all users:

# clmgr on cluster when=now clinfo=yes manage=auto broadcast=no

For HADR it’s almost the same except that we don’t want to automatically manage resource groups:

# clmgr on cluster when=now clinfo=yes manage=manual broadcast=no

Note: Do not set PowerHA to run upon system restart.

18.1) Validation
Your command will complete but that does not mean the PowerHA daemons are yet operational.
Continue to monitor the status using the command: /usr/es/sbin/cluster/sbin/cl_nodecmd "lssrc -ls
clstrmgrES | grep state" while looking for the state of “ST_STABLE”.

# /usr/es/sbin/cluster/sbin/cl_nodecmd "lssrc -ls clstrmgrES | grep state"


aplmd501: Current state: ST_INIT
aplmd502: Current state: ST_INIT

# /usr/es/sbin/cluster/sbin/cl_nodecmd "lssrc -ls clstrmgrES | grep state"
aplmd501: Current state: ST_STABLE
aplmd502: Current state: ST_STABLE

Once the cluster daemons are up, run the command /usr/es/sbin/cluster/utilities/clRGinfo and validate that
each resource group has come online on the expected node.
Note: For HADR clusters the groups should be OFFLINE at this point.

# /usr/es/sbin/cluster/utilities/clRGinfo
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
rg_udb3 ONLINE aplmd501
OFFLINE aplmd502

rg_udb4 ONLINE aplmd502


OFFLINE aplmd501

Application Install
At this point we have validated that our shared storage has been properly configured and has come online
on our primary node. You can now install any application data to the shared storage according to the
application documentation. You can also change the mount point permissions as needed. Even though
the cluster is active you can manually unmount your filesystems without impacting PowerHA to change
mount points. When changing mount point permissions ensure that all nodes have the same permissions.

Create an Application Server


We will now configure the start and stop scripts for each application. When configuring a standard DB2
cluster be sure to reference Appendix B for the appropriate scripts and settings.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
For DB2 HADR clusters refer to Appendix C for the appropriate scripts and settings.
HADR requires 2 Application Servers. Create them using the following naming convention:
as_<instance>
as_s_<instance>

An application server requires three parameters:


 Server Name: Choose the name in the format as_<application name>
 Start Script: This should be a script on local storage (not failed over) and without spaces.
 Stop Script: This should be a script on local storage (not failed over) and without spaces.

20.1) Sample Configuration command (it’s a single line)


# clmgr add application_controller as_pddd670 \
STARTSCRIPT="/opt/PowerHAscripts/db2.ha.start-pddd670.ksh" \
STOPSCRIPT="/opt/PowerHAscripts/db2.ha.stop-pddd670.ksh"

20.2) Validation
Run the command “clmgr -v query application” to validate the application settings.

# clmgr -v query application


NAME="as_pddd013"
MONITORS=" "
STARTSCRIPT="/opt/PowerHAscripts/db2.hadr.start-pddd013-PASDF01D-primary.ksh"

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
STOPSCRIPT="/opt/PowerHAscripts/db2.hadr.stop-pddd013-PASDF01D-primary.ksh"

Configure Application Monitoring


We will now configure monitoring for the application. When configuring a standard DB2 cluster be sure
to reference Appendix B for the appropriate scripts and settings.

For DB2 HADR clusters refer to Appendix C for the appropriate scripts and settings.
HADR requires 2 Application Monitors. Create them using the following naming convention:
asm_<instance>
asm_s_<instance>

An application monitor requires many parameters:


 Monitor Name: Choose the name in the format asm_<application name>
 Application Server(s) to Monitor: The app server this monitor is associated with
 Monitor Mode: We use mode “Both”.
 Monitor Method: This should be a script on local storage (not failed over) without spaces.
 Monitor Interval: Frequency in seconds to run the monitor script
 Hung Monitor Signal: signal sent to the monitor script when it's considered hung.
 Stabilization Interval: Time in seconds to wait for the application to start before monitoring begins
 Restart Count: How many restart attempts to make before considering a failover to a secondary
machine
 Action on Application Failure: Use “fallover.
 Cleanup Method: This should be a script on local storage (not failed over) without spaces that is run
when a failure is detected
 Restart Method: This should be a script on local storage (not failed over) without spaces that is run
after the cleanup method when a failure is detected

21.1) Sample Configuration Commands (single line commands)


Active/passive
# clmgr add application_monitor asm_pddd670 TYPE="custom" \
APPLICATIONS="as_pddd670" MODE="both" \
MONITORMETHOD="/opt/PowerHAscripts/db2.ha.monitor-pddd670-PRA2301D.ksh" \
MONITORINTERVAL="120" \
HUNGSIGNAL="9" \
RESTARTINTERVAL="924" \
STABILIZATION="300" \
RESTARTCOUNT="2" \
FAILUREACTION="fallover" \
CLEANUPMETHOD="/opt/PowerHAscripts/db2.ha.stop-pddd670.ksh" \
RESTARTMETHOD="/opt/PowerHAscripts/db2.ha.start-pddd670.ksh"

HADR Primary
# clmgr add application_monitor asm_pddd013 TYPE="custom" \
APPLICATIONS="as_pddd013" MODE="both" \

MONITORMETHOD="/opt/PowerHAscripts/db2.hadr.monitor-pddd013-PASDF01D-primary.ksh"
\
MONITORINTERVAL="45" \
HUNGSIGNAL="9" \
RESTARTINTERVAL="0" \
STABILIZATION="180" \
RESTARTCOUNT="0" \
FAILUREACTION="fallover" \

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
CLEANUPMETHOD="/opt/PowerHAscripts/db2.hadr.stop-pddd013-PASDF01D-primary.ksh"
\
RESTARTMETHOD="/opt/PowerHAscripts/db2.hadr.start-pddd013-PASDF01D-primary.ksh"

HADR Standby
# clmgr add application_monitor asm_s_pddd013 TYPE="custom" \
APPLICATIONS="as_s_pddd013" MODE="both" \

MONITORMETHOD="/opt/PowerHAscripts/db2.hadr.monitor-pddd013-PASDF01D-standby.ksh"
\
MONITORINTERVAL="120" \
HUNGSIGNAL="9" \
RESTARTINTERVAL="594" \
STABILIZATION="60" \
RESTARTCOUNT="3" \
FAILUREACTION="fallover" \

CLEANUPMETHOD="/opt/PowerHAscripts/db2.hadr.stop-pddd013-PASDF01D-standby.ksh" \
RESTARTMETHOD="/opt/PowerHAscripts/db2.hadr.start-pddd013-PASDF01D-standby.ksh"

21.2) Validation
Run the command “clmgr -v query monitor <monitor>” to validate the application settings.

# clmgr -v query monitor asm_pddd013


NAME="asm_pddd013"
APPLICATIONS="as_pddd013"
TYPE="user"
MODE="both"
MONITORMETHOD="/opt/PowerHAscripts/db2.hadr.monitor-pddd013-PASDF01D-primary.ksh"
MONITORINTERVAL="45"
HUNGSIGNAL="9"
STABILIZATION="180"
RESTARTCOUNT="0"
RESTARTINTERVAL="0"
FAILUREACTION="fallover"
NOTIFYMETHOD=""
CLEANUPMETHOD="/opt/PowerHAscripts/db2.hadr.stop-pddd013-PASDF01D-primary.ksh"
RESTARTMETHOD="/opt/PowerHAscripts/db2.hadr.start-pddd013-PASDF01D-primary.ksh

Finalize Resource Group


Now we need to add the application server and the VIP to a resource group.

 Service IP Labels/Addresses: the VIP for the application


 Application Servers: the application server

22.1) Sample Configuration command (single line command)


# clmgr modify resource_group rg_pddd670 APPLICATIONS="as_pddd670" SERVICE_LABEL="pddd670"

22.2) Validation
Run the command “clmgr –v query rg <resource group>”

# clmgr -v q rg rg_pddd013
NAME="rg_pddd013"

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
CURRENT_NODE="avlmd510"
NODES="avlmd510 avlmd511"
STATE="ONLINE"
TYPE="non-concurrent"
APPLICATIONS="as_pddd013"
STARTUP="OHN"
FALLOVER="FNPN"
FALLBACK="NFB"

SERVICE_LABEL="pddd013"

Synchronize HACMP Resources

# clmgr sync cluster

Failover Resource Groups


For HADR clusters skip this step and refer to Apendix D for startup and failover instructions.

We will now validate that each resource group fails over properly to all secondary nodes.
Resource Group: your resource group
Destination Node: the fail over server

24.1) Failover command


# clmgr move rg <resource group> node=<destination node>

24.2) Sample command


# clmgr move rg rg_pddd013 node=avlmd511

24.3) Validation
After waiting for a period of time greater than the stabilization interval that you defined above, run the
command /usr/es/sbin/cluster/utilities/clRGinfo <resource group>

# /usr/es/sbin/cluster/utilities/clRGinfo rg_pddd013
-----------------------------------------------------------------------------
Group Name Group State Node
-----------------------------------------------------------------------------
rg_pddd013 OFFLINE avlmd510
ONLINE avlmd511

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Create a Cluster Snapshot
At this point we have a functional cluster that is ready for deployment. We will take a cluster snapshot in
case we ever need to restore to this point.

25.1) Create snapshot command


# clmgr add snapshot <snapshot_name> DESCRIPTION=”<snapshot description>”

25.2) Configuration Selections


 Cluster Snapshot Name: Original_Cluster_Configuration
 Cluster Snapshot Description: Cluster definition as created during original build out

25.3) Sample command


# clmgr add snapshot Original_Cluster_Configuration DESCRIPTION=“Cluster definition as created
during original build out”

25.4) Validation
Run the command “clmgr query snapshot” and make sure the snapshot is listed.

# clmgr query snapshot


Original_Cluster_Configuration
active.0
active.1
active.2

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Appendix A – Converting an Existing Volume Group
Convert an Existing Volume Group
You will now convert an existing standard volume group to a shared volume group for each application
that needs to fail over.

6.1) Prerequisite
Before you convert your volume group you must first configure your disks appropriately to ensure they
are ready to be shared between your systems. First you must identify the same disk/lun across your
systems. When using EMC devices, the command “powermt display dev=all” can be used to match the
“Logical device ID”. For other disk types, the following commands can be used to ensure you have
paired the right devices:
lsattr -El <disk> -a lun_id
lscfg -vl <disk> | grep “Serial Number”
odmget -q "attribute=unique_id and name=<disk>" CuAt

Once you have identified the devices that will be used in the shared volume group, run the following
commands to prepare your disks for PowerHA. If these settings are not already in place on node 1 you
will need to bring down all active filesystems in the volume group.
On node 1:
chdev -l <disk> -a reserve_policy=no_reserve
On node 2:
cfgmgr
chdev -l <disk> -a reserve_policy=no_reserve

Next we must ensure the major number for each volume group will not conflict with the secondary
server’s settings. As a standard numbering convention we will use numbers starting at 100. The
following command provides a list of available major number ranges.

/usr/es/sbin/cluster/sbin/cl_nodecmd /usr/sbin/lvlstmajor

Example:
# /usr/es/sbin/cluster/sbin/cl_nodecmd /usr/sbin/lvlstmajor
aplmd501: 35..99,101...
aplmd502: 35..99,101...

In the above example, the numbers 35-99 are available and everything else starting at 101. Since 100 is
already taken we would select 101.

6.2) Smit Menu


Fastpath smitty chvg
PowerHA N/A
Menu

6.3) Configuration Selections


 VOLUME GROUP name: Press F4 and select the intended volume group
 Activate volume group AUTOMATICALLY: Select no
 Convert this VG to Concurrent Capable: Select “enhanced concurrent”

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
6.4) Sample Configuration Selections
* VOLUME GROUP name pddd777_vg
* Activate volume group AUTOMATICALLY no +
at system restart?
* A QUORUM of disks required to keep the volume yes +
group on-line ?
Convert this VG to Concurrent Capable? enhanced concurrent +
Change to big VG format? no +
Change to scalable VG format? no +
LTG Size in kbytes 1024 +
Set hotspare characteristics n +
Set synchronization characteristics of stale n +
partitions
Max PPs per VG in units of 1024 32 +
Max Logical Volumes 256 +
Mirror Pool Strictness +

6.5) Prepare Volume Group on Primary Node


At this point our volume group has been converted but we need to make the secondary node aware of this
change.

1. Stop any application that is using any of the filesystems on that volume group. The command fsuser
<filesystem> can be used to determine the pid of any processes that are using that filesystem.
2. Unmount all of the filesystems associated with that volume group.
unmount <filesystem>
3. Turn any automount options off for each filesystem:
chfs -A no <filesystem>
4. Vary off the volume group:
varyoffvg <volume group>
5. Export the volume group:
exportvg <volume group>
6. Import the volume group using the new major number:
importvg -y <volume group> -V <major number> <disk for vg>

6.6) Import Volume Group on Secondary Node


1. Import the volume group using the selected major number:
importvg -y <volume group> -V <major number> <disk for vg>

6.7) PowerHA Menu


Add the converted VG to a resource group.

6.8) Configuration command


 Volume Groups: Press F4 and choose the appropriate volume groups
# clmgr modify resource_group <resource group> VOLUME_GROUP="<volume group>"

6.9) Sample Configuration command


# clmgr modify resource_group rg_udb3 VOLUME_GROUP="pddd777_vg"

6.10) Validation
Running the command /usr/es/sbin/cluster/cspoc/cl_ls_shared_vgs will list out all shared volume groups.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
# /usr/es/sbin/cluster/cspoc/cl_ls_shared_vgs
#Volume Group Resource Group Node List
heartbeat_vg <None> aplmd501,aplmd502
pddd777_vg rg_udb3 aplmd501,aplmd502

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Appendix B – DB2 Monitoring Configuration Requirements
The settings required for a PowerHA DB2 configuration are defined in the AMEX owned ABB
document. The settings below are not expected to change, but this reference should not be considered the
source document for this information. The values listed below were pulled from
ABB_IBM_PowerHA_7.x_v1.doc. The location for the scripts/binaries is at the bottom of this appendix.
NOTE: The original ABB has a slightly different naming than the one used in this doc. This is due to a
change that was made during testing. We expect the ABB to be updated soon.

DB2 requires customized PowerHA scripts developed by the IBM and TIE teams to start, stop,
clean, and monitor the DB2 instances. There are also 2 binary programs 1 of which runs as a
daemon and connects to the DB2 instance and the other is a client of the DB2 connection daemon
and can detect if the database is online. These scripts/binaries will be provided by IBM.

Since PowerHA 7 doesn’t support scripts with parameters, we created a wrapper that can be
symlinked and uses the symlink name to figure out the parameters for the actual script.

PowerHA custom DB2 monitoring scripts and functions:


Script or symlink Name Functionality
Wrapper script to overcome PowerHA 7 limitation regarding scripts’
db2.pwha.wrapper.ksh parameters.
Not to be called directly. Symlinks to this script must be used.
Starts db2 instances. Script will clean up db2 processes and shared
db2.ha.start.ksh memory and semaphores before starting a db2 instance.
Not to be called directly. The wrapper will do it.
Stops the database instance. When stopping an instance the script will
detect if db2stop has hung, and will do a hard kill (clean) on the db2
db2.ha.stop.ksh
instance.
Not to be called directly. The wrapper will do it.
When run, checks to see if the database is online and if so returns 0;
otherwise returns 1 for offline. Starts HDDB2Serv daemon if not
running and runs HDDB2Client which connects the HDDB2Serv
daemon and then exits. Checks to see if the control log file is being
db2.ha.monitor.ksh
updated if all other indications are that the DB is offline as a last resort.
Can provide customized per instance settings such as time to wait for
db2stop command for the db2.ha.stop.ksh script.
Not to be called directly. The wrapper will do it.
Runs in daemon mode and connects to the database in a constant
HDDB2Serv
fashion.
When started, connects to the HDDB2Serv daemon and checks if the
HDDB2Client
database is running.
db2.ha.start-{instance}-{datab Symlink to the wrapper script. The wrapper will use the symlink name
ase}.ksh to call the real script using the right parameters.
db2.ha.stop-{instance}-{datab Symlink to the wrapper script. The wrapper will use the symlink name
ase}.ksh to call the real script using the right parameters.
db2.ha.monitor-{instance}-{d Symlink to the wrapper script. The wrapper will use the symlink name
atabase}.ksh to call the real script using the right parameters.
function.LogMsg.ksh Helper script.
function.StopHDDB2Serv.ksh Helper script.
fs_monitor.pl Detects file systems that are unmounted for the resource group.
fs_remount.pl Remounts filesystems for a resource group.

Common DB2 PowerHA configuration settings:

 Common DB2 Application Server configuration:


o Start Script = /opt/PowerHAscripts/db2.ha.start-{Instance}.ksh
o Stop Script = /opt/PowerHAscripts/db2.ha.stop-{Instance}.ksh

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
 Common DB2 Application Monitor configuration:
o Monitor Mode = Both
o Monitor Interval = 120
o Hung Monitor Signal = 9
o Stabilization Interval = 300
o Restart Count = 2
o Restart Interval = 924
o Action on Application Failure = fallover
o Cleanup Method = /opt/PowerHAscripts/db2.ha.stop-{Instance}.ksh
o Restart Method = /opt/PowerHAscripts/db2.ha.start-{Instance}.ksh

 Common Resources and attributes for a Custom Resource Group::


o Startup Policy = Online On Home Node Only
o Fallover Policy = Fallover To Next Priority Node In The List
o Fallback Policy = Never Fallback

 Other settings
o For clusters with multiple DB2 instances, balance the node priorities so that in
general an equal number of DB2 instances are running on each side of the cluster.

Script locations:
All of the required scripts and binaries listed above in the “PowerHA custom DB2 monitoring scripts and
functions” section should be copied to the /opt/PowerHAscripts directory. Mount the respective NFS
repositories, copy the files and then chmod 755 the files. The files must be created on all nodes in the
cluster.

The filesystem monitor scripts (fs_monitor.pl and fs_remount.pl) are owned and maintained by the OS
Engineering team and can be found at the following NFS repository:
appii501.ipc.us.aexp.com:/export/software/powerha/scripts
The remaining scripts are all owned and maintained by the DB team. The location for those scripts is
maintained by that team, but at the time this document was published they were available at the following
NFS repository:
sppiu527.ipc.us.aexp.com:/software/PowerHa/Aix

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Appendix C – DB2 HADR Configuration Requirements
The settings required for a PowerHA DB2 HADR configuration are defined in the AMEX owned ABB
document. The settings below are not expected to change, but this reference should not be considered the
source document for this information. The values listed below were pulled from
ABB_IBM_PowerHA_7.2_v1.doc. The location for the scripts/binaries is at the bottom of this appendix.
NOTE: The original ABB has a slightly different naming than the one used in this doc. This is due to a
change that was made during testing. We expect the ABB to be updated soon.

DB2 HADR requires customized PowerHA scripts to start, stop, clean, and monitor the DB2
instances. These scripts/binaries will be provided by IBM.

PowerHA custom DB2 HADR scripts and functions:


Script Name Functionality
Wrapper script to overcome PowerHA 7 limitation regarding scripts’
db2.pwha.wrapper.ksh parameters.
Not to be called directly. Symlinks to this script must be used.
Starts db2 HADR on the specified DB and role only if it's safe and the
operation wouldn't cause data loss. It also executes HADR takeover
db2.hadr.start.ksh
during a failover operation.
Not to be called directly.
Placeholder for future enhancements. Doesn't stop the instance or
db2.hadr.stop.ksh execute any action. The real actions are done by the start script.
Not to be called directly.
When run, checks to see if the database is online, hadr is active, the role
matches the one being monitored and the hadr status is “peer”.
If so returns 0;
If the database is online, the role matches the monitor but hadr status is
db2.hadr.monitor.ksh
not “peer” it will also return “0” but will create an IMR ticket I nthe
DBA queue for investigation.
If DB is down it returns 1 for offline.
Not to be called directly.
db2.hadr.start-{instance}-{dat Symlink to the wrapper script. The wrapper will use the symlink name
abase}-{role}.ksh to call the real script using the right parameters.
db2.hadr.stop-{instance}-{dat Symlink to the wrapper script. The wrapper will use the symlink name
abase}-{role}.ksh to call the real script using the right parameters.
db2.hadr.monitor-{instance}- Symlink to the wrapper script. The wrapper will use the symlink name
{database}-{role}.ksh to call the real script using the right parameters.
fs_remount.hadr.pl Remounts filesystems.

Script locations:
All of the required scripts and binaries listed above should be copied to the /opt/PowerHAscripts
directory. Mount the respective NFS repositories, copy the files and then chmod 755 the files. The files
must be created on all nodes in the cluster.

The filesystem monitor scripts (fs_monitor.pl and fs_remount.pl) are owned and maintained by the OS
Engineering team and can be found at the following NFS repository:
appii501.ipc.us.aexp.com:/export/software/powerha/scripts
The remaining scripts are all owned and maintained by the DB team. The location for those scripts is
maintained by that team, but at the time this document was published they were available at the following
NFS repository:
sppiu527.ipc.us.aexp.com:/software/PowerHA/Aix

DB2 HADR PowerHA configuration settings:

 Primary DB2 HADR Application Server configuration (as_<instance>):

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
o Start Script = /opt/PowerHAscripts/db2.hadr.start-{Instance}-{db}-primary.ksh
o Stop Script = /opt/PowerHAscripts/db2.hadr.stop-{Instance}-{db}-primary.ksh

 Standby DB2 HADR Application Server configuration (as_s_<instance>):


o Start Script = /opt/PowerHAscripts/db2.hadr.start-{Instance}-{db}-standby.ksh
o Stop Script = /opt/PowerHAscripts/db2.hadr.stop-{Instance}-{db}-standby.ksh

 Primary DB2 HADR Application Monitor configuration (asm_<instance>):


o Monitor Mode = Both
o Monitor Method =
/opt/PowerHAscripts/db2.hadr.monitor-{instance}-{db}-primary.ksh
o Monitor Interval = 45
o Hung Monitor Signal = 9
o Stabilization Interval = 120
o Restart Count = 0
o Restart Interval = 0
o Action on Application Failure = fallover
o Cleanup Method = /opt/PowerHAscripts/db2.hadr.stop-{Instance}-{db}-primary.ksh
o Restart Method = /opt/PowerHAscripts/db2.hadr.start-{Instance}-{db}-primary.ksh

 Standby DB2 HADR Application Monitor configuration (asm_s_<instance>):


o Monitor Mode = Both
o Monitor Method =
/opt/PowerHAscripts/db2.hadr.monitor-{Instance}-{db}-standby.ksh
o Monitor Interval = 120
o Hung Monitor Signal = 9
o Stabilization Interval = 60
o Restart Count = 3
o Restart Interval = 594
o Action on Application Failure = fallover
o Cleanup Method = /opt/PowerHAscripts/db2.hadr.stop-{Instance}-{db}-standby.ksh
o Restart Method = /opt/PowerHAscripts/db2.hadr.start-{Instance}-{db}-standby.ksh

 Common Resources and attributes for a Custom Resource Group::


o Startup Policy = Online On Home Node Only
o Fallover Policy = Fallover To Next Priority Node In The List
o Fallback Policy = Never Fallback

 Other settings
o Configure the node priorities for the both Resource Groups so that they have the
same nodes in the reverse order.
o Only the Primary resource group will have an IP label assigned (instance VIP)
o No VG will be assigned to any of the resource groups.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
o Filesystems should be auto-mounted at boot time.

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY
Appendix D – DB2 HADR startup and failover procedures

 Cluster startup (after boot or after stopping PowerHA)


o Start services on both nodes with the “manage resource groups” option set to
“Manually”

# clmgr on cluster when=now clinfo=yes manage=manual broadcast=no

o Contact DBA team to find out on which node each group should be started.
o Start the Standby (rg_s_<instance>) group first.

# clmgr online rg rg_s_<instance> node=<node2>

o Once the Standby group is “ONLINE” proceed to start the Primary group
(rg_<instance> on the other node.

# clmgr online rg rg_<instance> node=<node1>

 Manual Failover
o Stop Standby group first (rg_s_<instance>).
# clmgr offline rg rg_s_<instance> node=<node2>
o Failover the Primary group to the other node (rg_<instance>).
# clmgr move rg rg_<instance> node=<node2>
o Start Standby group on the other node.
# clmgr online rg rg_s_<instance> node=<node1>

Date: 5/3/18
CONFIDENTIAL - FOR INTERNAL USE ONLY

You might also like