Administration Guide
The following is an example of a user-defined event: Perhaps you want
to shut down DB2 database partitions on an AIX physical node when paging space
reaches a certain percentage of fullness, and to log this occurrence.
An example to correct a paging space shortage by shutting down a database
partition and forcing a transaction abort to free paging space is
provided. The examples are found in the /SAMPLES
directory. Another common example is process death: You may want
to restart a DB2 database partition, or you may want failover to occur if a
process dies on a given node.
With HACMP ES there is a rules file,
/user/sbin/cluster/events/rules.hacmprd, that contains HACMP
events.
Each event in the file is made up of nine lines which are:
- Event name. Each event name must be unique.
- State. This is the qualifier for the event. The event name
and state are the rule triggers. HACMP ES Cluster Manager initiates
recovery only if it finds a rule with a trigger corresponding to the event
name and state.
- Resource Program Path. This is a full-path specification of the
xxx.rp file containing the recovery program.
- Recovery Type. This is reserved for future use.
- Recovery Level. This is reserved for future use.
- Resource Variable Name. This is used for Event Manager
events.
- Instance Vector. This is used for Event Manager events.
Within Event Management, this is a set of elements, where each element is a
name and value pair of the form "name=value". The values uniquely
identify the copy of the resource in the system and, by extension, the copy of
the resource variable.
- Predicate. This is used for Event Manager events. Within
Event Management, this is the relational expression between a resource
variable and other elements that, when true, the Event Management subsystem
generates an event to notify Cluster Manager and the appropriate
application.
- Rearm Predicate. This is used for Event Manager events.
Within Event Management, this is a predicate used to generate an event that
alternates the status of the primary predicate. This predicate is
typically the inverse of the primary predicate. It can also be used
with the event predicate to establish an upper and a lower boundary for a
condition of interest.
Each object requires one line in the event definition even if the line is
not used. If these lines are removed, HACMP ES Cluster Manager cannot
parse the event definition properly. And this may cause the system to
hang. Any line beginning with "#" is treated as a comment
line and is not treated as part of the event definition.
Note: | The rules file requires exactly nine lines for each event definition not
counting any comment lines. When adding a user-defined event at the
bottom of the rules file, it is important to remove the unnecessary empty line
at the end of the file, or the node will hang.
|
An example of an event definition for node_up follows:
##### Beginning of the Event Definition: node_up
#
TE_JOIN_NODE
0
/usr/sbin/cluster/events/node_up.rp
2
0
# 6) Resource variable - only used for event management events
# 7) Instance vector - only used for event management events
# 8) Predicate - only used for event management events
# 9) Rearm predicate - only used for event management events
###### End of the Event Definition: node_up
This is an example of just one of the event definitions that are found in
the rules.hacmprd file.
In this example, when the node_up event occurs, the recovery
program /usr/sbin/cluster/events/node_up.rp is
executed. According to the rules, the proper values are specified in
the state, recovery type, and recovery level lines in the definition.
There are four (4) empty lines for: resource variable, instance
variable, predicate, and rearm predicate.
Users can add their own events to react to non-standard HACMP ES
events. For example, to define the event that the /tmp file
system is over 90 per cent full, the rules.hacmprd file must
be modified.
Many events are predefined in the IBM Parallel System Support Program
(PSSP). These events can be exploited when used within user-defined
events. To make this happen, do the following:
- Stop the cluster.
- Edit the rules.hacmprd file. Backup the file
before modifying it. Add the predefined PSSP event manually. If
you need synchronizing points across all nodes in the cluster, use the
barrier command in the recovery program. (Read more about
the barrier command and synchronization of recovery programs in the HACMP
Concepts, Installation, and Administration Guides.)
- Restart the cluster. The rules.hacmprd file is
stored in memory when Cluster Manager is started. To accurately
implement the changes, restart all the clusters. There should not be
any inconsistent rules in a cluster.
- Cluster Manager uses all events in the rules.hacmprd
file.
HACMP ES uses PSSP event detection to treat user-defined events. The
PSSP Event Management subsystem provides comprehensive event detection by
monitoring various hardware and software resources.
Resource states are represented by resource variables. Resource
conditions are represented as expressions called predicates.
Event Management receives resource variables from the Resource Monitor,
which observes the state of specific system resources and transforms this
state into several resource variables. These variables are periodically
passed to Event Management. Event Management applies predicates that
are specified by the HACMP ES Cluster Manager in
rules.hacmprd to each resource variable. When the
predicate is evaluated as being true, an event is generated and sent to the
Cluster Manager. Cluster Manager initiates the voting protocol and the
recovery program file (xxx.rp) is executed on a set of nodes
specified by "node sets" in the recovery program and according to event
priority.
The recovery program file (xxx.rp) is made up of one or
more recovery program lines. Each line is declared in the following
format:
relationship command_to_run expected_status NULL
There must be at least one space between each value in the format.
"Relationship" is a value used to decide which program should run on
which kind of node. Three types of relationship are supported:
- All. The specified command or program is executed on all nodes of
the current HACMP cluster.
- Event. The specified command or program is executed only on the
nodes where the event occurred.
- Other. The specified command or program is executed on all nodes
where the event did not occur.
"Command_to_run" is a quote-delimiting string with or without a
full-path definition to an executable program. Only HACMP-delivered
event scripts can use a relative-path definition. With other scripts or
programs, the full-path definition must be used (even if these programs are
located in the same directory as the HACMP event scripts).
"Expected_states" is the return code of the specified command or
program. It is an integer value or an "x". If "x" is
used, Cluster Manager does not care about the return code. For all
other codes, it must be equal to the expected return code. If it is
not, Cluster Manager detects the event failure. The handling of this
event "hangs" the process until the problem is solved through a manual
intervention to recover. Without manual intervention, the node does not
hit the barrier to synchronize with the other nodes. Synchronization
across all nodes is a requirement for the Cluster Manager to control all the
nodes. "NULL" is a field reserved for future use. The word
"NULL" must appear at the end of each line except the barrier
line. If you specify multiple recovery commands between two barrier
commands, or before the first one, the recovery commands are executed in
parallel on the node itself and between the nodes.
The barrier command is used to synchronize all the commands across all the
cluster nodes. When a node hits the barrier statement in the recovery
program, Cluster Manager initiates the barrier protocol on this node.
Since the barrier protocol is a two-phase protocol, when all nodes have met
the barrier in the recovery program and "voted" to approve the protocol,
then all nodes are notified that both phases have completed.
In summary, the following actions make up the process:
- Either Group Services/ES for predefined events, or Event Management for
user-defined events, notifies Cluster Manager of the event.
- HACMP ES Cluster Manager reads the rules.hacmprd file
and determines the recovery program mapped to the event.
- HACMP ES Cluster Manager runs the recovery program which consists of a
sequence of recovery commands.
- The recovery program executes the recovery commands which may be shell
scripts or binary commands.
Note: | The recovery commands are the same as the HACMP event scripts in HACMP for
AIX.
|
- HACMP ES Cluster Manager receives the return status from the recovery
commands. An unexpected status "hangs" the cluster until manual
intervention using smit cm_rec_aids or the
/usr/sbin/cluster/utilities/clruncmd command is carried out.
Included with DB2 UDB EEE are sample scripts for failover/recovery and for
user-defined events. The scripts will work "as is" or you can
customize or change the recovery action.
- DB2 database partition recovery script rc.db2pe.
This is the script file used to start and stop the HACMP configuration on a
database partition. It also works as a HACMP start and stop script for
a NFS server of the DB2 instance owner.
- DB2-specific user-defined events for HACMP ES. Six default events
are included: one for process recovery, two for paging space, and three
for NFS and automounter recovery.
- DB2 instance NFS fileserver failover. This script provides for
failover recovery of the server of the filesystem for a DB2 instance to a
backup.
- Network failover. The scripts network_up_complete,
network_back and network_down_complete,
network_down allow SP DB2 database partitions to failover if their
SP Switch adapter should fail.
- Scripts to define monitoring events for the SP GUI Perspectives are
included. Monitoring of failover and user-defined recovery is possible
through the Event and Hardware Perspectives. Read the documentation for
PSSP Administration to find out more about Perspectives.
- Installation scripts to install and remove core scripts and events on the
HACMP ES nodes.
- Script files to create and remove the SP Perspectives problem management
(pman) resources for monitoring the HACMP and DB2 configuration.
The script files are located in the DB2 UDB EEE
$INSTNAME/sqllib/samples/hacmp/es directory.
The recovery scripts need to be installed on each node that will run
recovery. The script files can be centrally installed from the SP
control workstation or other designated SP node. To install, complete
the following tasks:
- Copy the scripts from the $INSTNAME/sqllib/samples/hacmp/es
directory to one of either the SP control workstation or another SP node that
can run the pcp and pexec commands. (The
pcp and pexec commands are required for the install so
ensure that you have the ability to run them.)
- Customize the reg.parms.SAMPLE and
failover.parms.SAMPLE files for your environment by
setting key parameters such as BUFFPAGE for failover configurations.
Typically for mutual takeover configurations, your failure settings will be
adjusted lower to one-half the size of your regular settings or less.
Also, you will use a copy of these files renamed with your own name (instead
of "SAMPLE").
- Customize as necessary the five (5) parameters NFS_RETRIES, START_RETRIES,
MOUNT_NFS, STOP_RETRIES, and FAILOVER in the rc.db2pe
file. The three retries and the single failover settings should be
adequate for almost all implementations. The MOUNT_NFS setting should
be configured depending on whether you will be using the package for NFS
server availability. You should specify this setting if you wish
rc.db2pe to mount and verify the NFS home directory of the
DB2 instance owner for you. Setting the FAILOVER parameter to
"YES" will cause the running of db2_proc_restart and attempt
to restart a DB2 database partition. If unsuccessful in this attempt,
HACMP will be shutdown with a failover.
- Customize db2_paging_action, db2_proc_recovery, and
nfs_auto_recovery in the event file. Also, edit
pwq to change this to the DB2 instance owner. Customize the
db2_paging_action to indicate the action to take if paging space
gets more that ninety percent full. (If this does occur, the DB2
database partition is stopped.) Modify the script if additional
recovery actions are required.
- Use db2_inst_ha to install the scripts and events on the nodes
you specify.
Note: | HACMP ES must be pre-installed on these nodes before you begin.
|
The syntax of db2_inst_ha is:
db2_inst_ha $INSTNAME/sqllib/samples/hacmp/es <nodelist> <DATABASENAME>
where
$INSTNAME/sqllib/samples/hacmp/es is the directory where the scripts/event are located
<nodelist> is the pcp or pexec style of nodes; for example, 1-16 or 1,2,3,4
<DATABASENAME> is the name of the database for regular and failover
parameter files.
The reg.parms.SAMPLE and
failover.parms.SAMPLE will be copied to each node and
renamed reg.parms.DATABASENAME.
db2_inst_ha will copy files to each node in /usr/bin and
update the HACMP event files:
/usr/sbin/cluster/events/rules.hacmprd,
/usr/sbin/cluster/events/network_up_complete, and
/usr/sbin/cluster/events/network_down_complete.
- Configure your system and scripts with HACMP.
- Use the create_db2_events command to install the monitoring
events for problem management resources (pman) and the SP GUI
Perspectives. Additional configuration and customization in
Perspectives is needed. For more information on Perspectives, read the
PSSP Administration Guide.
- Use the ha_db2stop command to shutdown the database partitions
without HACMP ES failover recovery taking place. To use this command,
copy the file to the database user's home directory and make sure
permissions and ownership are set for that user. To stop the database
without failover recovery, then as that user, type:
ha_db2stop
Note: | You must wait for the command to return. Exiting by using a ctrl-C
interrupt, or by killing the process, may re-enable failover recovery
prematurely. This would result in not all database partitions being
stopped.
|
HACMP ES invokes the DB2 recovery scripts in the following way:
- node_up_local (starting a node)
- HACMP will run the node_up sequence, acquiring volume groups,
logical volumes, filesystems, and IP addresses specified in resource groups
owned (via cascading) or assigned (via rotating) to this node.
- When node_up_local_complete is run, the application server
definition which contains rc.db2pe is initiated to start the
database partition specified in the application server definitions on this
physical node.
Note: | rc.db2pe, when running in start mode, adjusts the DB2
parameters specified in reg.parms.DATABASE for each
DATABASE in the database directory that matches a parameter (parms)
file.
|
multiple HACMP clusters and start them in parallel, multiple nodes are brought
up at once.
- node_down_remote (failover)
- HACMP will acquire volume groups, logical volumes, filesystems, and IP
addresses specified in the resource group on the designated takeover
node.
- When node_down_remote_complete is run, HACMP will run
rc.db2pe as the application server start script specified in
the resource group for this database partition.
Note: | rc.db2pe, when running in a takeover mode (mutual
takeover), will stop the DB2 database partition running on it, adjust the DB2
parameters specified in failover.parms.DATABASE for
each DATABASE in the database directory that matches a parameter (parms) file,
and then starts both database partitions on the physical takeover node.
|
- node_up_remote (reintegration of a failed node - cascading
mutual takeover resource group)
- When node_up_remote is run on the old takeover node, the
application server definition causes rc.db2pe to be run in
stop mode.
Note: | rc.db2pe, when running in a reintegration mode (mutual
takeover), will stop both of the database partitions running on it, adjust the
DB2 parameters specified in reg.parms.DATABASE for
each DATABASE in the database directory that matches a parameter (parms) file,
and then starts just the database partition to be kept on this physical
takeover node.
|
- The old takeover node releases volume groups, logical volumes,
filesystems, and IP addresses specified in resource groups to be owned by the
reintegrating node.
- HACMP will re-acquire volume groups, logical volumes, filesystems, and IP
addresses specified in the resource group now owned by the reintegrating
node.
- When node_up_local_complete is run, the application server
definition which contains rc.db2pe is initiated to start the
DB2 database partition specified in the application server definition on this
reintegrating physical node.
Note: | rc.db2pe, when running in start mode will adjust the DB2
parameters specified in reg.parms.DATABASE for each
DATABASE in the database directory that matches a parameter (parms)
file.
|
- node_down_local (node stop or stop with takeover)
- When node_down_local is run on the stopping node, the
application server definition causes rc.db2pe to be run in
stop mode.
Note: | rc.db2pe, when running in a stop mode will adjust the DB2
parameters specified in failover.parms.DATABASE for
each DATABASE in the database directory that matches a parameter (parms) file,
and then stops the DB2 database partition (this is for takeover).
|
- HACMP releases volume groups, logical volumes, filesystems, and IP
addresses specified in resource groups now owned by the node.
- db2_proc_recovery (db2 process death)
- All nodes run the db2_proc_restart script. The node
which had the failure restarts the correct DB2 database partition.
- db2_paging_recovery (paging space recovery)
- All nodes run the db2_paging_action script. If a node
has more than seventy (70) percent of paging space filled, a wall command is
issued. If a node has more than ninety (90) percent of paging space
filled, then DB2 database partitions on this physical node are stopped and
restarted.
- nfs_auto_recovery (nfs or automount process failure)
- All nodes run the rc.db2pe script in NFS mode. If
a NFS process stops running, then it is restarted. In a similar way, if
the automount process stops running then it is restarted.
- network_down_complete (network failure - SP switch)
- The net_down script is called. This verifies the network
as the SP switch network and verifies it is down. If so, it waits a
user-defined time interval. The default time interval is one hundred
(100) seconds.
- If the SP switch network comes back as indicated by
network_up_complete event, then no recovery is effected.
- If the time limit is reached, then HACMP is stopped with failover.
Note: | All events can be monitored through SP problem management and the SP
Perspectives GUI.
|
There are other script utilities available for your use which
include:
[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]
[ DB2 List of Books |
Search the DB2 Books ]