Recovering from error situations is a normal task associated with application programming, system administration, database administration and system operation. Distributing databases over several remote servers increases the potential for error situations resulting from network or communication failures. To ensure data integrity, the database manager provides the two-phase commit process which is illustrated in "Understanding the Two-Phase Commit Process" as points (10), (11) and (12). The following explain how the database manager handles errors during this two-phase commit process:
If a database responds that it failed to prepare to commit the unit of work, the database client will roll back the unit of work during the second phase of the commit process. A prepare message will not be sent to the transaction manager database in this case.
During the second phase, the client sends a rollback message to all participating databases that successfully prepared to commit in the first phase. Each database then writes an "ABORT" record to their log file and releases the locks held for this unit of work.
Error handling at this stage is dependent on whether the second phase will commit or roll back the transaction. The second phase will only roll back the transaction if the first phase encountered an error.
If one of the participating databases fails to commit the unit of work (possibly due to a communications failure), the transaction manager database will retry the commit on the failed database. The database manager configuration parameter resync_interval ("Transaction Resync Interval (resync_interval)") is used to determine how long the transaction manager database will wait between attempts to commit the unit of work.
If the transaction manager database fails, it will resynchronize the unit of work when it is restarted. The resynchronization process will attempt to complete all indoubt transactions, that is, those transactions that have finished the first phase and have not completed the second phase of the commit. The database manager where the transaction manager database resides will perform the resynchronization by:
If one of the participating databases fails and is restarted, the database manager for this database will query the status of the transaction manager database for the status of this transaction to determine whether the transaction should be rolled back. If the transaction is not found in the log, the database manager assumes the transaction was rolled back and will roll back the indoubt transaction for this database. Otherwise, the database will wait for a commit request from the transaction manager database.
If the transaction was coordinated by a Transaction Processing Monitor (XA-compliant transaction manager), then the database will always depend on the Transaction Processing Monitor to initiate the resynchronization.
If, for some reason, you cannot wait for the transaction manager to automatically resolve the indoubt transaction, there are some actions you can take to manually resolve the states of indoubt transactions. This manual process is sometimes referred to as "making a heuristic decision".
The LIST INDOUBT TRANSACTIONS command (and a related set of APIs) allows you to query, commit, and roll back indoubt transactions. In addition, it also allows you to "forget" transactions that have been heuristically committed or rolled back by removing the log records and releasing the log space. For information about the command and APIs, see the Command Reference and the API Reference manuals.
You should use this command (or APIs) with extreme caution and as a last resort. The best solution is to wait for the transaction manager to drive the resynchronization process. You could experience data integrity problems if you manually commit or roll back a transaction in one of the participating databases, and the opposite action is taken for another of the databases. Recovering from data integrity problems requires you to understand the application logic, the data changed or rolled back, and then to perform a point-in-time recovery of the database, or manually undo/redo the database changes.
If you cannot wait for the transaction manager to initiate the resynchronization process and you must release the resources tied up by an indoubt transaction, then heuristic operations are necessary. This situation could occur if the transaction manager will not be available for an extended period of time to do the resync, and the indoubt transaction is tying up resources that are urgently needed. An indoubt transaction ties up the resources that were associated with this transaction before the transaction manager or participating database became unavailable. These resources include things such as the locks on tables and indexes, log space, and storage taken up by the transaction. Each indoubt transaction also decreases (by one) the maximum number of concurrent transactions that can be handled by the database.
There are no foolproof ways to perform heuristic operations. You can use the following steps as a guideline:
Note: | The LIST INDOUBT TRANSACTIONS command includes "type" information to
show you the role of the database in each indoubt transaction:
|
For configuration considerations in the DB2 Universal Database two-phase commit environment, refer to "Other Configuration Considerations in Any Environment".
In particular, if the autorestart database configuration parameter is OFF and there are indoubt transactions in either the TM or RM databases, the RESTART DATABASE command is required in order to start up the resynchronization process. If issuing the RESTART DATABASE command from the command line processor, use different sessions. If you restart a different database from the same session, the connection established by the previous restart database command will be dropped. The database will need to be restarted again if the last connection to it is dropped. Issue DB2 TERMINATE to drop the connection after there are no indoubt transactions listed by the DB2 LIST INDOUBT TRANSACTIONS command.