Administration Guide

Chapter 22. High Availability Cluster Multi-Processing, Enhanced Scalability (HACMP ES) for AIX

Enhanced Scalability is a feature of HACMP for AIX Version 4.2.2 which currently only runs on RS/6000 SP nodes.

This feature provides the same failover recovery as HACMP and has identical event structure to previous HACMP versions. There are several documented differences to this event structure documented in the HACMP for AIX, V4.2.2, Enhanced Scalability Installation and Administration Guide. Beyond these standard items, the Enhanced Scalability feature provides:

Larger HACMP clusters with scalability up to 16 nodes per cluster.
Additional error coverage through "User-Defined Events". Monitored areas can trigger user-defined events which can be as diverse as the death of a process or the fact that paging space is nearing capacity. Once detected, events are triggered.
Such events include pre- and post-events that can be added to the failover recovery process, if needed. Extra functions that are specific to the different implementations can be placed within the HACMP pre- and post-event streams.
A rules file (/usr/sbin/cluster/events/rules.hacmprd) exists and contains the HACMP events. User-defined events are added to this file and the script files to be run when events occur are part of this definition. The rules file is described in more detail later.
HACMP client utilities for monitoring and detecting status changes in one or more clusters from AIX physical nodes outside the HACMP cluster.
Although not an enhancement, the discussion of HACMP ES concludes with an overview of the installation and migration planning required for this feature.

Note:

Do not use a "kill -9" against the db2start process in a high availability environment. This action is not recommended in any environment, but in particular such an action may invalidate failover recovery in your high availability environment.

The nodes in HACMP ES clusters exchange messages called "heartbeats" or "keepalive" packets which inform the other nodes regarding the availability of each node in the cluster. A node that has stopped responding causes the remaining nodes in the cluster to invoke recovery. The recovery process is called a "node_down event" and may also be referred to as "failover". The completion of the recovery process is followed by work done on the node that is down with the goal being the re-integration of the node into the cluster. This is called a "node_up event".

There are two types of events: standard events that are anticipated within the operations of HACMP ES; and, user-defined events which are associated with the monitoring of parameters in hardware and software components.

One of the standard events is the node_down event. When planning what should be done as part of the recovery process, HACMP allows two failover options: Hot (or idle) Standby; and, Mutual Takeover.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]