IBM Books

Administration Guide


Using the LOAD Utility

The LOAD utility is intended for the initial load or an append of a table where large amounts of data are moved. There are no restrictions on the data types used by the LOAD utility including large objects (LOBs) and user-defined types (UDTs). The LOAD utility speeds up the task of loading large amounts of data into a database. LOAD is faster than IMPORT because LOAD writes formatted pages directly into the database while IMPORT does SQL INSERTs. The data being loaded must be local to the server (unlike IMPORT and EXPORT where data can be passed from the client).

In addition to the overview to the LOAD utility above, here are some details about the LOAD utility that may be of interest to you if you are concerned about recovering your database after a failure involving the database. See Chapter 7. "Recovering a Database" for more information on this subject. Logging is required for fully recoverable databases. The LOAD utility almost completely eliminates the logging associated with the loading of data. In place of logging, you have the option of making a copy of the loaded portion of the table. LOAD does not fire triggers; and does not perform referential, and table, constraint checking (other than validating the uniqueness of the unique indexes). Tables with such options defined may be populated faster, or more simply, using IMPORT. If you have a database environment that allows for database recovery following a failure, you can do one of the following:

Note:You are not able to LOAD data into typed tables.

The LOAD utility can take advantage of a hardware configuration where multiple processors and/or multiple storage devices are used such as in a symmetric multiprocessor (SMP) environment. There are several ways in which parallel processing of large amounts of data can take place using the LOAD utility. One way is through the use of multiple storage devices which allows for I/O parallelism during the LOAD process. Another way involves the use of the multiple processors in an SMP environment which allows for intra-partition parallelism. And both can be used together to provide even faster loading of the data.

The following topics provide more information:

Overview of the LOAD Process

There are multiple phases to the LOAD process: Load, where the data is written into the table; Build, where the indexes are created; and Delete, where the rows that caused a unique key violation are removed from the table. You must run the SET CONSTRAINTS SQL statement after the load completes if there are tables left in the check pending state to validate the table for referential integrity and check constraints. The LOAD utility generates messages about the progress of each phase. If a failure occurs during the LOAD process, these messages will assist you in deciding how to recover.

During the Load phase, data is loaded into the table; index keys and table statistics are collected if necessary. Save points, or points of consistency, are established at intervals specified by you in the SAVECOUNT parameter on the LOAD command. These points of consistency are not established exactly on the number of rows specified with the SAVECOUNT parameter; rather the number of rows are converted to a page count, and rounded up to intervals of the extent size. Messages let you know how many input rows were successfully loaded at the time of the save point. If a failure occurs, you should use the number of input rows at the last successful consistency point with the RESTARTCOUNT parameter during a restart. If the failure occurs near the beginning of the LOAD process and you were doing a REPLACE, you might consider restarting the load using the REPLACE option.

During the Build phase, indexes are created based on the index keys collected in the Load phase. The index keys are sorted during the Load phase and index statistics are collected. The statistics collected are similar to those collected during RUNSTATS. If a failure occurs, the Build phase restarts from the beginning.

Unique key violations are placed into the exception table, if one was specified, and messages on rejected rows are put into the message file. Following the completion of the LOAD process: review these messages, correct any problems, and insert the corrected rows into the table.
Note:The recording of warnings has a detrimental effect on the performance of the LOAD. If performance is important, and you anticipate a large number of warnings, you should consider using the NOROWWARNINGS filetype modifier. If this filetype modifier was specified, these warnings are suppressed.

During the Delete phase, all rows causing a unique key violation are deleted. If a failure occurs, this phase should be restarted by you from the beginning. Information on the rows containing the invalid keys is stored in a temporary file. After you request a restart to begin at the Delete phase, the violating rows are deleted based on the information in a temporary file. You must not modify any data in any temporary files. Also, you must restart the LOAD command with the same parameters, otherwise the Delete phase will fail. If the temporary file has been modified, or does not exist, you should restart the LOAD command at the Build phase. Once the index is re-built, any invalid keys are placed in the exception table if it exists, and duplicate keys are deleted.
Note:The Delete phase is used to remove records from the tables that were inserted in the Load phase which were discovered during the Build phase to have violated a uniqueness condition (for example, they are duplicate records). However, each deletion event is recorded or logged. If you have a large number of records that violate the uniqueness condition, then the log could fill up during the Delete phase.

Since regular logging is not performed, LOAD uses pending states to preserve consistency of the database. The Load and Build phases of the LOAD process place any associated table spaces into a load pending state. The Delete phase of the LOAD process places any associated table spaces into a delete pending state. If you complete the LOAD process but you do not have either logretain or userexit "on"; and, you have not specified the COPY YES option nor the NONRECOVERABLE option, then any associated table spaces are placed in a backup pending state. These states can be checked by using the LIST TABLESPACES command. (For more information on LIST TABLESPACES, see the Command Reference manual.) One last possible state associated with the LOAD process is concerned with referential and check constraints. Dependent tables may be placed in a check pending state following the completion of the LOAD process.

LOAD can also be used with a non-recoverable option. This allows you to perform a non-recoverable LOAD without affecting the recoverability of all other tables in the database. When this type of LOAD is run, there is no requirement for either using the COPY YES option or having a backup taken.

If a LOAD fails, the table space(s) involved could be in an inconsistent state because there is no logging. For this reason, the table spaces are left in a load pending state. To remove the load pending state, you will have to restart the LOAD, perform a LOAD REPLACE on the same table on which the LOAD failed, or recover the table space(s) using a RESTORE with the most recent backup (either table space or database backup) and then carry out further recovery actions. (You could also drop the table space and then re-create it.)

For more information on how to recover, see Chapter 7. "Recovering a Database".

Details About LOAD

The LOAD utility inserts data into a table from an input file, from a device, or using a named pipe, any of which must reside on the node where the database resides. The table must exist. (Indexes on the table may or may not exist. LOAD only builds indexes that are already defined on the table.) If the table receiving the new data already contains data, you can replace or append to the existing data.
Note:If you are loading a table that already contains data and the database is non-recoverable, make sure that you have a backed-up copy of the database, or the table space for the table being loaded, before using LOAD so that you can recover from errors.

If the existing table is a parent table containing a primary or unique key referenced by a foreign key in a dependent table, replacing data in the parent table places the dependent table in a check pending state. The SET CONSTRAINTS statement must then be used to validate the referential and check constraints.

The table spaces in which the loaded table resides are quiesced in exclusive mode. For more information on QUIESCE, see the Command Reference manual.
Note:In the following command line processor example, and in the other in the other examples in this chapter, relative path names are used. Please be aware that relative path names are only allowed on calls from a client on the same node as the database. The use of fully-qualified path names is recommended.

The following is an example of the command line processor syntax for the LOAD command:

     db2 load from stafftab.ixf of ixf messages staff.msgs
        insert into userid.staff copy yes use adsm data buffer 20

This example assumes no indexes are involved, any warning or error messages are placed in staff.msgs, a copy of the changes made is stored in ADSTAR Distributed Storage Manager (ADSM), and 20 pages of buffer space are to be used during the load. See "ADSTAR Distributed Storage Manager" for more information on using ADSM.

The following is a different example of the command line processor syntax for the LOAD command:

     db2 load from stafftab.ixf of ixf messages staff.msgs
        remote file stafftmp replace into staff using . sort buffer 200

This example is similar to the previous one in the use of the MESSAGES parameter. The differences from the previous example are: Instead of INSERTing the data, the data in the table is being replaced; instead of taking a COPY using ADSM, parameters are used to assist with index creation by using the USING and the SORT BUFFER parameters. The REMOTE FILE parameter is used to declare stafftmp as a base file name for temporary files such as "stafftmp.msg", "stafftmp.log", and "stafftmp.rid". The current directory of the user, indicated by using ., is used as the working directory to hold the temporary files needed for index creation. Two hundred (200) pages of buffer space are used to sort index keys. For more information on the LOAD command, see the Command Reference manual.

Since you will typically be loading large amounts of data using the LOAD command, a LOAD QUERY command can be used to check the progress of the LOAD process if the REMOTE FILE option has been specified. You require a connection to the same database and a separate CLP session to enter this command. This command can be used by local and remote users. For more information on the LOAD QUERY command, see the Command Reference manual.

The LOAD utility can also be invoked by the application programming interfaces (APIs) sqluload and sqlgload. For more information on the requirements when loading data to a table using these APIs, see the API Reference manual.

You should also review the following points. Each represents a task that you may need to perform and each is carried out as part of the LOAD command. For more information on any of the following tasks, see the API Reference and/or the Command Reference manuals.

There are restrictions and limitations with the LOAD utility:

The LOAD utility optionally updates table and index statistics as part of the load process if run in REPLACE mode. If data is appended into a table, statistics are not gathered for the table. Run RUNSTATS following the completion of the load process to collect up-to-date statistics for the table. If gathering statistics on a table with a unique index, and duplicate keys are deleted during the Delete phase, then the statistics are not updated to account for the deleted records. If you think you will have duplicate records, you should not collect statistics during the LOAD but run RUNSTATS after the LOAD process.

To ensure the loaded data doesn't cause referential integrity or check constraint violations, any loaded table that is a parent table will cause all dependent tables to be placed in a check pending state. To remove the table from the restricted state, you must run the SET CONSTRAINTS statement.

With LOAD, there is a possibility of unequal code page situations involving possible expansion or contraction of the character data. Such situations could occur with Japanese or Traditional-Chinese Extended UNIX Code (EUC) and double-byte character sets (DBCS) which may have different length encodings for the same character. An option, NOCHECKLENGTHS, is used to toggle between two situations:

  1. Comparison of input data length to target column length is performed before reading in any data. If the input length is larger than the target, NULLs are inserted for that column if it is nullable. Otherwise, the request is rejected. This is the default.
  2. No initial comparison is performed and, on a row-by-row basis, an attempt is made to load the data. If the data is too long after translation is complete, the row is rejected. Otherwise, the data is loaded. Specifying NOCHECKLENGTHS will enable this behavior.

LOAD Performance Considerations

The performance of LOAD depends on the nature and size of the data, the number of indexes, the options used, and whether the SET CONSTRAINTS statement is required. Use of SET CONSTRAINTS lengthens the total time needed to load the table and make it usable again. (For more information on the SET CONSTRAINTS statement, see the SQL Reference manual.)

The LOAD utility performs almost equally well in either INSERT mode or REPLACE mode.

Index creation will reduce the performance of the load process, especially when data is added to a table already containing data. If there are many indexes on a table which already has a large amount of data and only a small percentage of data to be loaded, you should consider using the IMPORT utility instead of the LOAD utility. Unique indexes also reduce the performance of the load process if duplicates are encountered. In most cases it is still more efficient to create the index during the LOAD than to complete the LOAD and then use the CREATE INDEX command for each of the indexes.

The LOAD utility automatically attempts to provide the best performance possible by determining the best values for DISK_PARALLELISM, CPU_PARALLELISM, DATA_BUFFER, and SORT_BUFFER if these parameters have not be specified by the user at the time the utility is run. Optimization of these values is done based on the size and the free space available in the utility heap. Consider allowing the LOAD utility to choose the values for these parameters and then attempt to tune the parameters for your particular needs.

Performance of the LOAD can be improved by:

The COPY YES/NO option specifies whether to create a copy of the input data during LOAD or not. If "YES" is chosen, performance is reduced because all the data being loaded is copied at the same time. This choice is faster than accepting a backup pending state and having to do a backup later before accessing the table. If "NO" is chosen, and forward recovery in enabled, then the table is placed in a backup pending state.

LOAD Parameter Hints and Tips

The parameters associated with the LOAD command offer you many options and alternatives when loading data into tables. All of the options, however, can be overwhelming because of their number (for newer users) and the way that they can interact with each other. This section provides additional hints and tips on most LOAD parameters with a particular focus on those affecting load performance.

DATA BUFFER and SORT BUFFER
The database configuration parameter Utility Heap Size (util_heap_sz) is the source of the memory used by the following utilities within DB2:

The default for the Utility Heap Size configuration parameter is 5 000 pages (each 4 KB in size). Depending on the amount of storage available on your system, you should consider allocating more pages for use by the DB2 utilities.

Within the load utility, the memory allocated from the Utility Heap is divided among many areas of which the DATA BUFFER and the SORT BUFFER are two of the important ones. Because LOAD is only one of several utilities that use memory from the Utility Heap, it is recommended that for planning purposes you consider that no more than fifty percent of the pages defined by the Utility Heap Size configuration parameter as being available for LOAD usage. The remainder of the memory is available to the other utilities mentioned above.

The DATA BUFFER controls the size of the buffer used to load data into a table. The recommendation is that this buffer should be several extents in size. The extent size is defined within DB2 and can be one or many 4 KB pages in size. An extent is the unit of movement for data within DB2. This parameter is useful when working with large objects (LOBs), and it reduces I/O waiting time.

The SORT BUFFER controls the size of the buffer to sort index keys associated with the data being loaded into a table. This buffer is used during the index create phase of the LOAD process. It is only used if there are indexes on the table data being loaded. If used, this buffer is more important than the DATA BUFFER when considering the performance of the load operation. The recommendation is that the SORT BUFFER size be set as large as possible (depending on the overall storage available on your system and the other considerations mentioned above).

To set the configuration parameter for the Utility Heap Size, you should do the following from the command line:

Changes to the database configuration file become effective only after they are loaded into memory. All applications must disconnect from the database before this can occur. For a server configuration parameter, this occurs during the execution of DB2STOP followed by a DB2START. For a client configuration parameter, this occurs when the application is restarted. If the client is the command line processor, it is necessary to use TERMINATE. (TERMINATE breaks the application connection to a database and terminates the back-end process.)

CPU_PARALLELISM
Use CPU_PARALLELISM to exploit Symmetric Multi-Processor (SMP) parallelism (if this is part of your machine's capability). The parameter specifies the number of processes or threads used by the load utility to parse, convert, and format data records. The maximum number allowed is 30. Using this parameter whenever possible has a strong positive impact in reducing the time taken to complete the load operation.

Also, if this parameter is not specified, the load utility will select a reasonable default based on the number of CPUs on the system.

DISK_PARALLELISM
Use DISK_PARALLELISM to exploit the available number of containers that can be written to when loading data. The parameter specifies the number of processes or threads used by the load utility to write data records to disk. The maximum default number allowed is the greater of four times the CPU_PARALLELISM value or 50. Using this parameter whenever possible has a strong positive impact in reducing the time taken to complete the load operation.

By default, DISK_PARALLELISM is equal to the sum of the table space containers on all table spaces containing objects for the table being loaded, except where this default exceeds the maximum default number allowed (see above).

SAVECOUNT
This parameter sets the interval for the establishment of consistency points for the data being loaded. If SAVECOUNT must be used, then ensure the value used with this parameter is set sufficiently high. The synchronization of activities performed to establish a point of consistency takes time. If done too frequently, there will be a noticeable slowing of the load performance. Depending on the estimated number of rows to be loaded, the value used with this parameter is recommended to be greater than one million.

The value shown from the LOAD QUERY command can be used if the LOAD fails and needs to be restarted. When loading, we recommend the REMOTE FILE parameter be used. After the failure, LOAD QUERY using the remote file will give you the last consistency point. You can then rerun the load with RESTARTCOUNT n (where "n" is the last consistency point value and with the RESTART parameter.

STATISTICS YES
If STATISTICS YES is requested during the load, the performance of the load will decrease. The most significant decrease in the performance of the load occurs when DETAILED INDEXES ALL is chosen. Even with this performance decrease, you should consider using this option. The load utility collects statistics faster this way than using the RUNSTATS command following the completion of the LOAD.

To perform their best, applications require the use of the best distribution and index statistics possible. To update the DB2 system statistics, you should use STATISTICS YES during the load operation; or, use the RUNSTATS command following the load. Once the statistics are updated, applications may need the new access paths to the table data based on the latest statistics. New access paths to a table can be created by rebinding the application packages using the BIND command.

The Statistics Heap Size (stat_heap_sz) out of the agent private memory is used when collecting statistics. Typically, LOAD is used when loading data into large tables. Therefore, the default for the stat_heap_sz configuration parameter is not recommended when using STATISTICS YES on the LOAD command; a larger value is recommended.

NONRECOVERABLE, COPY NO, COPY YES
If you are working with a recoverable database, there are parameters that are related to this type of database and that have an effect on the performance of the load operation. A recoverable database is one where you have determined that the data found there is important and must not be lost. The ability to backup and restore databases or table spaces is one part of what it means to have a recoverable database. The other part is the ability to apply any changes to the database between the times the backups are taken. This part is called roll-forward recovery and involves the saving of the activity against the database in logs. The information from these saved logs is applied to the restored database so that no information is lost as a result of a problem with the database.

You can decide the level of recoverability you wish for your database. You can decide to backup and restore a database or table space without needing the roll-forward capability. In such a case, you may not need the logs since you have another way to reconstruct the activity against the database between backups. For example, you may be collecting all transactions against the database and applying them in batch mode; and, you may also not be allowing any ad hoc user updates to the database. In this example, there is no need to log all changes to the database.

The load operation does not log each record added to the table as part of the load operation.

By choosing the NONRECOVERABLE parameter, there is no performance penalty since no additional activity beyond the movement of the data into the table is required. In addition, the load operation completes without leaving the table spaces in a backup pending state. During a subsequent restore and roll-forward recovery where this LOAD command is one of the operations being rolled-forward when recovering, the table that is the object of the load is not updated and is marked "invalid". Further actions against this table are ignored. After the roll-forward is complete, the table cannot be used and can only be dropped. This may be an acceptable course of action in your work environment depending on the table.

By choosing COPY NO and you have a roll-forward recovery enabled (by having either logretain or userexit parameters on), the table space where the table being loaded resides is placed in a backup pending state. A database or table space must be backed up before the data in the table can be accessed. The forced back up establishes a point of consistency for the table space or database. Recovery would then begin with the restoration of this backup. (Roll-forward could then follow.)

COPY NO is the default when using the LOAD command.

You can only choose COPY YES if roll-forward recovery is enabled. A copy of all changes to the table being loaded is saved. In a subsequent restore and roll-forward recovery where this LOAD command is one of the operations being rolled-forward when recovering, the changes are applied to the table and the table is usable following the completion of the roll-forward activity. The copying of each change as the table is being loaded causes increased I/O activity, which may increase the load time on an I/O-bound system. Specifying multiple devices or directories (on different disks) can offset some of the performance penalty resulting from this operation.

The syntax when using the COPY YES option is:

   db2 load form <filename> of DEL replace into <tablename>
      copy yes <dir1>, <dir2>, <dir3>, <dir4>, ...

where

   <dirN> are devices or directories.

You might select this option if you are adding new data to an existing table already containing data. You have decided you do not want to take a backup of the table space or database involved. Recovery would begin with the restoration of a backup taken before the load operation. Roll-forward could then follow including going through the load operation using the changes saved using the COPY YES option.

WARNINGCOUNT
During the load operation there may be warnings and errors generated. The parameters WARNINGCOUNT, FOR EXCEPTION, and MESSAGES can be used to keep track of the warnings and errors. If you are expecting a significant number of warnings, then set the WARNINGCOUNT parameter to zero (0) or do not use this optional parameter. The load operation will continue to completion regardless of the number of warnings issued.

If you are expecting only a few warnings or no warnings, then set the WARNINGCOUNT parameter to approximately the number you are expecting, or to twenty (20) if you are expecting no warnings. The load operation will stop after the WARNINGCOUNT number is reached. This will allow you the opportunity to correct the data (or drop and re-create the table being loaded) before attempting to complete the load operation. Although not having a direct effect on the performance of the load operation, use of WARNINGCOUNT can assist with the successful completion of your load operations. If you encounter unexpected warnings during load processing, the establishment of a WARNINGCOUNT threshold will prevent having to wait to the full completion of the load operation before determining there was a problem.

USING
If indexes are defined on the table, the load utility sorts all of the index keys in order to rebuild the indexes. If the number of records in the table is large, then the sort may not fit into the available memory (SORT BUFFER) and sorted data may need to overflow to disk. You can decrease the I/O overhead of sorting by striping the sort overflow to a series of disks. This is done with the USING option as follows:
   db2 load from <filename> of IXF insert into <tablename>
      using <dir1>, <dir2>, <dir3>, <dir4>, ...

where

   <dirN> are directories.

Here is a suggested method to determine how many directories are required:

If D is the number of directories specified, and L is the file size limit (which varies by operating system), and I is the estimated size of the sum of all the indexes, then the value of D is established using the following equation:

   D > (2 * I) / L

For example, assume the sum of all the indexes in your database is about 2.8 GB. If the file size limit for your system is 2 GB, then L = 2. Putting these two values into the equation results in:

   D > (2 * 2.8) / 2

Therefore, rounding D to the next highest integer, you would specify three (3) directories in the USING clause.

LOAD Temporary Files

You have control over the disk space used when working with LOAD. If you uniquely specify the location of the remote files used by the LOAD, you know where the temporary files will be created and thus know which disk will have space temporarily used.

The REMOTE FILE parameter specifies the directory path and the base name to be used when creating temporary files during a load operation. For example, we could declare the path and base name as:

   REMOTE FILE /u/bobdir/bob

During the load processing, LOAD may generate the following temporary files:

   /u/bobdir/bob.msg
   /u/bobdir/bob.rid
   /u/bobdir/bob.log

Each of these files is temporary and are all removed when the load processing completes without difficulty. The files are temporary in nature and are binary in format so that they are not designed to be viewed by you during load processing.

If you do not use the REMOTE FILE parameter on the LOAD command, the load process uses "db2utmp" as the base name and the directory path is the current working directory (if the LOAD command is executed on the server). If the LOAD command is not executed on the server and you do not use the REMOTE FILE parameter with an explicit directory path and base name, it may be difficult to determine where the temporary files are located.

Using LOAD QUERY

When using the LOAD QUERY command, you simply specify the base name to see the messages information on the load process.

The base name is declared using the LOAD command and the REMOTE FILE parameter. For example, we could declare the path and base name as:

   REMOTE FILE /u/bobdir/bob

To check the status of the load process, you could issue the following commands:

   db2 connect to database_name
   db2 load query /u/bobdir/bob

You could also declare a location to place the warning and error messages that occur during load processing. This is done by using the TO parameter on the LOAD QUERY command and declaring a fully-qualified path and a unique file name.

See the Command Reference for additional details on the LOAD QUERY command.

Running Concurrent LOAD Jobs

It is extremely important when running concurrent LOAD jobs that you know the following:

When running concurrent LOADs, the remote files must be unique. If they are not unique such as in the situation where you do not use the REMOTE FILE parameter on any of your concurrent LOAD commands, then LOAD does not enforce uniqueness. You will receive indeterminate results.

Restarting LOAD and Database Recovery

A LOAD can be restarted following a failure using RESTART or RESTARTCOUNT. The REMOTE FILE information is important when using a LOAD RESTART command. A copy image of the loaded data can be created for use when recovering a database.

The following discuss these considerations in more detail:

REMOTE FILE Considerations

The RESTART LOAD command restarts the load process after a previous load was interrupted. As part of the task of restarting, LOAD RESTART will need to use the information that has been saved in the remote files.

The REMOTE FILE parameter specifies the directory path and the base name to be used when creating temporary files during a load operation. For example, we could declare the path and base name as:

   REMOTE FILE /u/bobdir/bob

During the load processing, LOAD may generate the following temporary files:

   /u/bobdir/bob.msg
   /u/bobdir/bob.rid
   /u/bobdir/bob.log

Each of these files is temporary and are all removed when the load processing completes without difficulty. The files are temporary in nature and are binary in format so that they are not designed to be viewed by you during load processing.

If you do not use the REMOTE FILE parameter on the LOAD command, the load process uses "db2utmp" as the base name and the directory path is the current working directory (if the LOAD command is executed on the server). If the LOAD command is not executed on the server and you do not use the REMOTE FILE parameter with an explicit directory path and base name, it may be difficult to determine where the temporary files are located.

Restarting the LOAD Using RESTART

LOAD RESTART requires the information in the remote files. The specification of a fully-qualified path removes any possible ambiquity over the location of these files during the restart of the load process.
Note:It may still be possible to restart without having a fully-qualified path for the temporary remote files but only if the LOAD is restarted from the same directory where the original LOAD command was issued.

Restarting the LOAD Using RESTARTCOUNT

If a failure occurs while loading data, you can restart the load from the last save point or point of consistency; or reload the entire table by using the REPLACE option.

The remote file specified in the LOAD restart operation should be the one that was specified in the LOAD command that failed.

There are a number of options available should you decide to restart the load.

If you decide to restart using the RESTARTCOUNT number option, then you must use the number of rows at the last successful consistency point. To determine that value, use the LOAD QUERY command with either the name specified with the REMOTE FILE option or the default name db2utmp. By choosing RESTARTCOUNT number, the LOAD restarts from the row following the row identified by the number and attempts to finish the load.
Note:The RESTARTCOUNT number can only be used with the last successfully completed consistency point. If the last consistency point started but did not complete (that is, SQL3519W is not followed by SQL3520W), then you must carry out the action as described in the help for message SQL3519W.

If you do not want to continue loading rows, or if the failure was during the Build phase, you can use the RESTARTCOUNT B option. The LOAD process brings the table to the state of the last save point or point of consistency and then restarts the Build phase. By choosing this option the LOAD restarts, does not attempt to load additional rows, and builds the indexes for the rows already loaded.

If the message file states that the Build phase completed and all temporary files are unmodified as left by the LOAD, you can use the RESTARTCOUNT D option. The information on the rows containing duplicate keys stored in the temporary files is used to delete those rows.

The restarted LOAD command should continue until the completion of the LOAD process.
Note:For minor errors such as nonexistent data files or an invalid dcoldata, the LOAD will clean up and take the table out of the load pending state. You must do the LOAD again in either REPLACEor INSERT mode with correct parameters.

Eliminating a Phantom Quiesce

A quiesce on table spaces is persistent across any system failure. This is important to know when working with LOAD since it uses a quiesce in exclusive mode while carrying out the load process.

A quiesce is owned by a connection. When you lose a connection, the quiesce remains but it has no owner. This is called a "phantom quiesce".

To remove a phantom quiesce, you must:

Creating a Copy Image of Loaded Data

If the table being loaded is part of a recoverable database, then logging is in effect. Since LOAD does not log the changes made to the table, you have the option of specifying COPY YES to create a copy of the data being loaded. This copy is used during roll-forward recovery to re-create the changes to the database done by LOAD.

When using this option you should also consider using multiple devices or directories to allow for the best possible I/O exploitation.

For more information on the load copy location file, see "Using the Load Copy Location File".

If forward recovery is enabled (logretain or userexit is "on") and the COPY option was not used, all table spaces in which the loaded table resides are left in a backup pending state. A backup of the database or the table space(s) is required to remove this pending status. The backup is done before any other units of work against the database or table space can be started.

For more information on how to recover, see Chapter 7. "Recovering a Database".

During forward recovery, if the load copy is not available, then the table spaces (of the loaded table) which are being rolled forward will be set to the restore pending state. These table spaces must be restored from a backup image taken after the table load.

An error is returned if you specify NONRECOVERABLE and COPY YES. There is no need for a copy in such a case since it would not be needed during recovery.

LOAD Exception Table

The exception table is a user-created table which mimics the definition of the table being loaded. It is specified by the FOR EXCEPTION option on the LOAD command. The table is used to store copies of rows that violate unique index rules.
Note:Any rows rejected before the building of the index on the loaded table because of invalid data are not inserted into the exception table.

Rows are added to the existing information in the exception table. The existing information may include rows listing check constraint or foreign key violations; or invalid rows from a previous LOAD. If you want only the invalid rows from this LOAD, you will need to remove the existing rows before invoking LOAD.

The exception table used with the LOAD utility is identical to the exception table(s) used by the SET CONSTRAINTS statement. An exception table should be created to perform a load which has a unique index and may have duplicate records. If an exception table is not provided for the LOAD, and duplicate records are found, then the LOAD will continue. However, only a warning message is issued about the deleted duplicate records and the deleted duplicate records are not placed anywhere.

After completing the load, information in the exception table can be used any way you wish. You may want to use the information to correct any data that was in error and insert the rows into the original table.

The exception table message column has the following structure:

Table 32. Exception Table Message Column Structure for LOAD
Field number Contents Size Comments
1 Number of violations 5 characters Right justified padded with '0'
2 Type of first violation. Only "I" is used by LOAD 1 character 'I' - Unique Index violation
3 Length of constraint/index token 5 characters Right justified padded with '0'
4 Constraint/index token length from the previous field
Note:Only Unique Index violations will be reported by LOAD. The Check Constraint and Foreign Key violations will be reported by running the SET CONSTRAINTS statement with the IMMEDIATE CHECKED FOR EXCEPTION options. Only one unique index violation in a row is reported. LONG or Large Object (LOB) data is not inserted into the exception table. Index token is the "IID" value from SYSCAT.INDEXES that identifies the index.

See the SQL Reference manual for more information on exception tables.

Checking For Constraint Violations

The loaded table may be in the check pending state if it has table check constraints or referential integrity constraints defined on it. The STATUS flag of the SYSCAT.TABLES in the row corresponding to the loaded table indicates the check pending state of the table. For the loaded table to be usable, the STATUS must have a value of "N" indicating the normal state of the table.

To remove a table from the check pending state, use the SET CONSTRAINT statement. For more information on the SET CONSTRAINTS statement, see the SQL Reference manual. One or more tables may be submitted to be checked in a single invocation. For a dependent table to be checked, the parent table must not be in the check pending state. In the case of a referential integrity cycle, all the tables involved in the cycle must be included in a single invocation of the SET CONSTRAINTS statement.

To manage the loading of several tables, consider the position of each within referential relationships along with the table size and time windows available to carry out the load. It may be convenient, for example, to check the parent table for check constraint violations while the dependent table is loaded. This can only occur if the two tables are not in the same table space.

Exception tables are convenient for a consolidated report of all the rows that have constraints violated. If the exception table option is not used, only the first violation is reported. This may be a cause for frustration when dealing with large tables having more than one constraint violation. The same exception table used for the LOAD utility may be used for checking constraint violations. As with the LOAD utility, there is no checking done when running the SET CONSTRAINTS statement to ensure that the exception table is empty. The extra timestamp column in the exception table may be used to distinguish newly-inserted rows from the old ones, if necessary.

The SET CONSTRAINTS statement does not activate any DELETE triggers as a result of deleting rows that violate constraints. It must be noted, however, that once the table is removed from the check pending state, triggers are active. This implies that, if we correct data and INSERT rows from the exception table into the loaded table, any INSERT trigger defined on the table will be activated. The implications of this on the data should be considered and, if necessary, suitable action should be taken. One option is to drop the INSERT trigger, INSERT rows from the exception table, and then re-create the INSERT trigger.

If the underlying table (or tables) of a summary table are put in the check pending state, then all summary tables with REFRESH IMMEDIATE will be put into the check pending state. To take the summary table out of the check pending state, run the REFRESH TABLE statement against the table. You can also load the summary table and issue the SET CONSTRAINTS IMMEDIATE UNCHECKED statement. Refer to the SQL Reference for more information about these statements.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]