Two types of configuration are available in a DB2 system:
Two modes of failover support are provided. A brief description of each mode and its application to DB2 follows. For each, the simple scenario of a two-server cluster is described.
Each of the above configurations can be used to failover one or more partitions of a partitioned database.
You can use the hot standby capability to set up failover for a partition or partitions of a partitioned database configuration. If one server fails, then another server in the cluster can substitute for the failed server by automatically transferring the database partitions. To achieve this, the database instance and the actual database must be accessible to both the primary and failover server. This requires that the following installation and configuration tasks be performed:
Figure 86 shows how partitions fail over in a hot standby configuration. System A is running a one or more partitions of the overall configuration and System B is used as the failover system. When System A fails, the partition is restarted on the second system. The failover updates the db2nodes.cfg file, pointing the partition to System B's hostname and netname, then restarting the partition at the new system. When the failover is complete, all other partitions forward the requests targeted for this partition to System B.
Figure 86. Hot Standby Configuration
The following is a portion of the db2nodes.cfg file before and after the failover. In this example, node numbers 20, 22 and 24 are running on the system named MachineA of the cluster with the netname MachineA-scid0. After the failover, node numbers 20, 22 and 24 are running on the system named MachineB of the cluster and have a netname of MachineB-scid0.
Before: 20 MachineA 0 MachineA-scid0 <= Sun Cluster 2.1 22 MachineA 1 MachineA-scid0 <= Sun Cluster 2.1 24 MachineA 2 MachineA-scid0 <= Sun Cluster 2.1 db2start nodenum 20 restart hostname MachineB port 0 netname MachineB-scid0 db2start nodenum 22 restart hostname MachineB port 1 netname MachineB-scid0 db2start nodenum 24 restart hostname MachineB port 2 netname MachineB-scid0 After: 20 MachineB 0 MachineB-scid0 <= Sun Cluster 2.1 22 MachineB 1 MachineB-scid0 <= Sun Cluster 2.1 24 MachineB 2 MachineB-scid0 <= Sun Cluster 2.1
The mutual failover of partitions in a partitioned database environment requires that the failover of the partition occur as a logical node on the failover server. If two partitions of a partitioned database system run on separate servers of a cluster configured for mutual takeover, the partitions must fail over as logical nodes.
Figure 87 shows an example of a mutual takeover configuration.
Figure 87. Mutual Takeover Configuration
Another important consideration when configuring a system for mutual partition takeover is the database path of the local partition. When a database is created in a partitioned database environment, it is created on a root path, which is not shared across the database partition servers. For example, consider the following statement:
CREATE DATABASE db_a1 ON /dbpath
This statement is executed under instance db2inst and creates the database db_a1 on the path /dbpath. Each partition creates its actual database partition on its local /dbpath file system under /dbpath/db2inst/NODExxxx, where xxxx represents the node number. After a failover, a database partition will start up on another system with a different /dbpath directory. The only filesystems that are moved along with the logical host during a failover are the logical host filesystems. This means that a symbolic link must be created from the logical host file system to the appropriate /dbpath/db2inst/NODExxxx path.
For example,
cd /dbpath/db2inst ln -s /log0/disks/db2inst/NODE0001 NODE0001
The hadb2eee_addinst will set up symbolic links from INSTHOME/INSTANCE to the logical host filesystem that corresponds with the various database partitions (where INSTHOME is the instance owner's home directory, INSTANCE is the instance, and log0 is the logical host that is bound to database partition 1 via the hadb2-eee.cfg file). You must perform this manually for other database directories.
The following example shows a portion of the db2nodes.cfg file before and after the failover. In this example, node numbers 20, 22 and 24 are running on System A which has a hostname of MachineA with a netname of MachineA-scid0. Node numbers 30, 32, and 34 are running on System B which has a hostname of MachineB with a netname of MachineB-scid0. System A in this example is hosting a logical host which is responsible for database partitions 20, 22, and 24. System B is listed as a backup for this logical host and it will host it if System A goes down.
Before: 20 MachineA 0 MachineA-scid0 <= Sun Cluster 2.1 22 MachineA 1 MachineA-scid0 <= Sun Cluster 2.1 24 MachineA 2 MachineA-scid0 <= Sun Cluster 2.1 30 MachineB 0 MachineB-scid0 <= Sun Cluster 2.1 32 MachineB 1 MachineB-scid0 <= Sun Cluster 2.1 34 MachineB 2 MachineB-scid0 <= Sun Cluster 2.1 db2start nodenum 20 restart hostname MachineB port 3 netname MachineB-scid0 db2start nodenum 22 restart hostname MachineB port 4 netname MachineB-scid0 db2start nodenum 24 restart hostname MachineB port 5 netname MachineB-scid0 After: 20 MachineB 3 MachineB-scid0 <= Sun Cluster 2.1 22 MachineB 4 MachineB-scid0 <= Sun Cluster 2.1 24 MachineB 5 MachineB-scid0 <= Sun Cluster 2.1 30 MachineB 0 MachineB-scid0 <= Sun Cluster 2.1 32 MachineB 1 MachineB-scid0 <= Sun Cluster 2.1 34 MachineB 2 MachineB-scid0 <= Sun Cluster 2.1
If you do decide to use a mutual takeover environment for the coordinator node then you may want to adjust the following database manager configuration parameters:
Reducing the value of these parameters will reduce the failover time for the coordinator node, but will increase the risk of an FCM connection timeout. These parameters should be tuned to meet your requirements.