Administration Guide

Reducing the Impact of Media Failure

To reduce the possibility of having to recover from a media failure, and to simplify recovering from this type of failure, you should:

Mirror or duplicate the disks that hold the data and logs for important databases.
In a partitioned database environment, set up a more rigorous procedure for handling the data and logs on the catalog node. Because this node is very important for maintaining the database, you should put it on a more reliable disk, duplicate it, and take more frequent backups of it. Also try to avoid putting user data on it.

Note:

When an I/O error occurs on a table space, the database will "crash". Following a restart of the database, the table space with the I/O error is disabled while the rest of the database remains accessible.

Protecting Against Disk Failure

If you are concerned about damaged data or active logs due to a disk crash, an area you might wish to consider at some point is the use of some form of tolerance to disk failures. Generally, this would be accomplished through the use of a disk array. A disk array consists of a collection of disk drives that appear as a single large disk drive to an application.

Disk arrays involve disk striping, which is the distribution of a file across multiple disks, mirroring of disks and data parity checks. Through the use of a disk array, the data and logs are protected from disk faults, and you will not lose any transactions which may otherwise happen if disk fault tolerance were not implemented.

Disk arrays are sometimes referred to simply as RAID (Redundant Array of Inexpensive Disks). The specific term RAID generally applies only to hardware disk arrays. Disk arrays can also be provided through software in the operating system or application level. The point of distinction between hardware and software disk arrays is how CPU processing of I/O requests is handled. For hardware disk arrays, disk controllers manage the I/O activity, whereas with software disk arrays this is done by the operating system or application.

Hardware Disk Arrays (RAID)

With a RAID disk array, multiple disks are used and managed by a disk controller, complete with its own CPU. All of the logic required to manage the disks forming the array is contained on the disk controller and so this implementation is operating system independent.

There are five types of RAID architectures, RAID-1 through RAID-5, and each provides disk fault-tolerance. Each of the five has some trade-off in function and performance. By definition, RAID refers to a redundant array. RAID-0, which provides only data striping and not fault-tolerant redundancy, is purposely excluded in this discussion about protecting your data in the event of a disk failure. Although the RAID specification defines five architectures, only RAID-1 and RAID-5 are typically used today.

RAID-1 is also known as disk mirroring or duplexing. Disk mirroring duplicates data (complete file) from one disk onto a second disk using a single disk controller. Disk duplexing is the same as mirroring except disks are attached to a second disk controller (like two SCSI adapters). Data protection is good. Either disk can fail and data is still accessible from the other disk. With duplexing, a disk controller could fail as well and still have complete protection of data. Performance with RAID-1 is also good but the trade-off in this implementation is that the required disk capacity is twice that of the actual amount of data, since data is duplicated on pairs of drives.

RAID-5 involves data and parity striping by sectors. RAID-5 stripes data, sector(s) at a time, across all disks. Parity is interleaved with data information rather than stored on a dedicated drive. Data protection is good. If any disk fails, the data can still be accessed by using the information from the other disks along with the striped parity information. Read performance is good though write performance is considerably worse than that of RAID-1 or normal disk. A RAID-5 configuration requires a minimum of three identical disks. The amount of extra disk space required for overhead varies with the number of disks in the array. In the case of a RAID-5 configuration of 5 disks, the space overhead is 20%.

In using a RAID disk array, a failed disk (except RAID-0) will not prevent users from accessing data on the array. When hot-pluggable or hot-swappable disks are used in the array, a replacement disk can be swapped with the failed disk while the array is in use. For RAID-5, if two disks fail at the same time, all data is lost (but the chance of two disk failures at once is very rare).

You might consider using RAID-1 or software-mirrored disks, described in the next section, for your logs since this provides for recoverability to the point of failure and offers good write performance, which is important for logs. In situations where reliability is crucial so that time cannot be lost in recovering data in case of a disk failure, and write performance is not quite so critical, consider using RAID-5 disks. Further, if write performance is crucial and you are willing to achieve this with the cost of additional disk space, consider RAID-1 for your data as well as logs.

Software Disk Arrays

A software disk array accomplishes much the same as a hardware disk array but the management of the disk traffic is done by either an operating system task or an application program running on the server. The key point is that like all other programs, the software array must contend for CPU and system resources. This is not a good option for a CPU-constrained system and it should be remembered that overall disk array performance is dependent on the server's CPU load and capacity.

A typical software disk array provides disk-mirroring, as with RAID-1. Although redundant disks are required, a software disk array is comparatively inexpensive to implement since costly RAID disk controllers are not required. One caution with software disk arrays is that having the operating system boot drive in the disk array will prevent your system from starting if that drive fails. If the drive fails before the disk array is running, the disk array cannot start to allow access to the drive. Generally, a boot drive separate from the disk array is also required.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]