IBM Books

Administration Guide


Using the AutoLoader Utility

In a partitioned database, large amounts of data are positioned across many partitions. Partitioning keys are used to determine the particular database partition where each portion of the data resides. Therefore, data must pass through a splitting phase before it can be loaded at the correct database partition.

The entire "split and load" process is accomplished by the AutoLoader utility. The AutoLoader utility uses a hashing algorithm to partition the data into as many output sockets as there are database partitions in the nodegroup in which the table was defined. It then loads from these output sockets concurrently across the set of database partitions in the nodegroup. A key feature of the AutoLoader utility is that it uses sockets for all data transfer required in the split and load process. It also allows the use of multiple database partitions for the splitting phase, thereby improving the performance significantly.

The authority required to use the AutoLoader utility is the same as that required by the LOAD command; that is, SYSADM or DBADM authority. Also, the location of the LOAD dumpfile, the remote message file, and the directory used for sorting all have to be write-accessible by the instance owner.
Note:The location of the input files to the load operation cannot be a tape device. Also, AutoLoader does not support the ROWCOUNT option on the LOAD command. If the ROWCOUNT option is used, an error is returned.

In all examples shown relating to AutoLoader, keep in mind that the UNIX form for directories and paths is used. The separator between directories and subdirectories may vary dependent on the platform where the AutoLoader is used.

Modes of Operation

In all of the modes presented below, the input file locations (one or more) must be readable by the AutoLoader wherever the AutoLoader might be run.

The AutoLoader utility may be run in one of the following modes:

SPLIT_AND_LOAD
In this mode, data is split and then loaded on the correct database partitions. Sockets are used to transfer data. When working in this mode, multiple input files are allowed.

SPLIT_ONLY
With this choice, the data is only split. A set of split data files is generated for the requested output database partitions. You must have sufficient storage for each of the split data files. The output from the split function writes the files in the location pointed to by the parameter SPLIT_FILE_LOCATION or in the AutoLoader current working directory. The directory location has to be "write-accessible". Data is split into separate files that are named using the convention filename.xxx where xxx is the partition number to which the split file belongs. If there are multiple input data files in the LOAD command, they will all be split. However, only one split-file is generated for each database partition. The filename of the split-file is the same as the filename of the first input data file.

LOAD_ONLY
Data is expected to be already split into separate files that are named using the following convention filename.xxx or filename.00xxx where xxx is the partition number to which the split file belongs. AutoLoader expects to find these files in the SPLIT_FILE_LOCATION or in the current AutoLoader working directory. The directory location has to allow read-access to the AutoLoader, wherever the AutoLoader might be run. These split files are loaded concurrently on their corresponding partitions. If there are multiple input data files in your LOAD command (such as infile1, infile2, and so on), then AutoLoader loads infile1.xxx if it exists. Otherwise, it loads infile1.00xxx if it exists. If neither exist, AutoLoader flags an error and returns. If both exist, AutoLoader loads infile1.xxx. Once the first infile1 of either file type (xxx or 00xxx) is loaded, checking then begins for infile2 checking in the same order to determine which file type is to be used. If infile2.xxx exists, it is loaded. Otherwise, infile2.00xxx is loaded if it exists. If neither exists, the load operation is done. If either of the two infile2 files is loaded, the same checking for infile3 is carried out and this process is repeated until all of the input files are loaded.

ANALYZE
This option generates a customized optimal partitioning map for a nodegroup. It is recommended that a data file with a large number of records be specified as input. The output for the analysis is written into the file specified by the MAP_FILE_OUTPUT parameter. The output from the ANALYZE mode can be used with the MAP_FILE_INPUT parameter when performing a future AutoLoader function. Note that the larger the number of records used, the better the representation to the actual data that can be analyzed, and the better the resulting new partitioning map. The map will produce a more even distribution of data across each of the database partitions in the nodegroup. When working in this mode, multiple input files are allowed.

Additional Options and Considerations

When working with the AutoLoader configuration file, there are many options available for your use beyond the declaration of the mode of operation you wish to carry out.

RELEASE Level
The release level of this configuration file. Do not delete or modify this line in the configuration file.

LOAD Command
The most important part of the configuration file is the LOAD command. The AutoLoader requires the presence of the LOAD command to direct the handling of the data even if the mode of operation selected does not suggest any loading is required. For example, the AutoLoader extracts useful information from the LOAD command even when performing only a SPLIT_ONLY mode operation. The LOAD command lets the AutoLoader "know" where the data is coming from, what type of data it is (ASCII or Delimited ASCII), how the data is to be loaded, and where the data is going (the table name).

Ensure that you specify a complete LOAD command that includes the schema name, file name, file type, and table name. Also, the AutoLoader utility requires that the LOAD command conforms to the format of the "db2 -f" file, except for the extra leading "db2" keyword. There is no need to use the special escape shell characters in the LOAD command. Finally, if the last character on a line is a backslash ("\") character, the next line is a continuation of the current line. In this case the backslash and the end-of-line character(s) are ignored.

The Command Reference provides the details on all the parameters available using the LOAD command.

DATABASE Parameter
The DATABASE parameter is used to identify the database where the data is loaded. If no name is given for the DATABASE parameter, "SAMPLE" is used as the default.

HOSTNAME Parameter
The HOSTNAME parameter provides the name of the remote machine where the data file resides. This remote file machine may be an MVS host or another workstation. If not specified, and FILE_TRANSFER_CMD is set, then the host name "nohost" is passed to the FILE_TRANSFER_CMD in the <hostname> argument. There is no default associated with this parameter.

FILE_TRANSFER_CMD Parameter
The FILE_TRANSFER_CMD parameter provides a fully-qualified path and executable, batch file, or script that carries out data transfer from the host. If you are going to receive input from a remote host, you need to know what the AutoLoader utility requires to do this. The previous AutoLoader utility supported a concept of host file transfer, whereby the AutoLoader utility could be configured to transfer data files from a remote host. This option has been replaced by the "FILE_TRANSFER_CMD" feature.

The FILE_TRANSFER_CMD parameter identifies a fully-qualified path to an executable, batch file, or script that is used to transfer data from a remote host. The fully-qualified path must be accessible by the AutoLoader. The full path, including the execution file name, must not exceed 254 characters.

Before executing the executable, batch file, or script shown in the FILE_TRANSFER_CMD, the AutoLoader establishes named-pipes in anticipation of the data being sent from the host.

The number of named-pipes to be created is equivalent to the number of files or devices listed following the FROM clause on the LOAD command. This same information from the LOAD command is also used to create the parameters to pass to the executable, batch file, or script used to transfer the data from the remote host.

Based on this information, AutoLoader creates the following command line:

<COMMAND> <logpath> <hostname> <basepipename>
<nummedia> <source media list>

where

and the remaining items are parameters that may be used by the command

An AIX sample called atldftp.drv is found under the autoloader sub-directory of the samples sub-directory of the sqllib sub-directory. The AIX sample shows how FTP can be used to move data from a remote host.

SPLIT_FILE_LOCATION Parameter
The SPLIT_FILE_LOCATION parameter is used in two instances:

If not specified, and in SPLIT_ONLY mode, the split files are placed in the current working directory. If not specified, and in LOAD_ONLY mode, the AutoLoader utility looks for the split files in the current working directory.

OUTPUT_NODES Parameter
The database partitions on which the load is to be performed are identified by the OUTPUT_NODES parameter. The supplied partition numbers must be a subset of database partitions on which the table is defined. The default for this parameter is that all database partitions where the table is defined will have data loaded into them.

SPLIT_NODES Parameter
The list of database partitions participating in the splitting process is shown in the SPLIT_NODES parameter. Splitting database partitions may be the same or different from the database partitions being loaded. If not defined, the AutoLoader automatically determines how many partitions are needed for splitting, and which partitions are used for splitting, in order to achieve optimal performance. How to determine the number of partitions follows these rules:

RUN_STAT_NODE Parameter
In conjunction with the "STATISTICS YES" specification on the LOAD command, you can specify the node (database partition) on which you would like to gather statistics. If left blank or -1, the default is the first database partition in the output partition list.

MODE Parameter
The MODE parameter specifies the mode to run AutoLoader in. The mode can be one of: SPLIT_ONLY, LOAD_ONLY, ANALYZE, or SPLIT_AND_LOAD (the default). In addition to the complete split and load operation, other valid values are:

LOGFILE Parameter
The LOGFILE parameter is used to provide the base name of the temporary and permanent files used by the AutoLoader utility. This name is used as a base name to create the following files.
<logfile>.split.cfg ...
configuration file for all splitters
<logfile>.split.<3-digit-node-number>.log ...
log file for each splitter
<logfile>.pmap.<pid> ...
internal temporary file, where <pid> is the process id
of this AutoLoader job
<logfile>.load.<3-digit-node-number> ...
the message file for each loading process if there is no
message file specified in the LOAD command

You may include a path in the LOGFILE parameter, however you must ensure the existence and accessibility of the path. The default is "./autoloader.log".
Note:In the case where there are multiple concurrent AutoLoader sessions, you must ensure either the base name of LOGFILE or the path name of LOGFILE is unique.

AUTHENTICATION and PASSWORD Parameters
If a password is required for remote invocation of the splitter program, or client/server database connections when loading, the AUTHENTICATION parameter is needed. This parameter is used in conjunction with the PASSWORD parameter which defines the password to be used for remote splitter invocation or client/server database connections when loading. The default for AUTHENTICATION is NO (no password checking) and the PASSWORD parameter is ignored (if it is provided). Authentication can be achieved several ways:

MAX_NUM_SPLITTERS Parameter
You can establish the maximum number of splitter processes used in an AutoLoader job using the MAX_NUM_SPLITTERS parameter. The default is 25 splitters.

FORCE Parameter
The FORCE parameter forces the AutoLoader job to continue even if the AutoLoader detects (at startup time) that some target partitions or table spaces are offline. If "NO", and some partitions are unavailable, then no data will be processed. If "YES", database partitions which are available will be loaded, and all others will be ignored. The default for this parameter is "NO".

STATUS_INTERVAL Parameter
The number of megabytes to be loaded before generating a progress message can be entered using the STATUS_INTERVAL parameter. The unit of measurement is megabytes (MB). The default is 100 MB. Valid values are whole numbers in the range of 1 to 4000.

PORTS Parameter
The range of TCP ports used to create sockets for internal communications in AutoLoader is controlled using the PORTS parameter. The default range is 6000 up to 6063. If defined at the time of the AutoLoader invocation, the value of the DB2ATLD_PORTS DB2 registry variable replaces the value of the PORTS configuration parameter.

CHECK_LEVEL Parameter
The CHECK_LEVEL parameter is used to determine whether there should be checks for the truncation of records during input or output. The default is NOCHECK (no check for truncation at input or output). CHECK is the only other value allowed.

MAP_FILE_INPUT Parameter
When using a partitioning map, the MAP_FILE_INPUT parameter is used to give the input file name pointing to to the file containing the customized partitioning map. When using ANALYZE mode, the output file name for the partitioning map is identified using the MAP_FILE_OUTPUT parameter. If the partitioning map is customized rather than a default map, this parameter must be specified. You can get a customized partitioning map by either using the db2gpmap program to extract the map from the database system catalog table; or you can run the ANALYZE mode of db2atld to generate an optimal map. The map generated by the ANALYZE mode must be moved to each database partition in your database before actual loading can proceed.

MAP_FILE_OUTPUT Parameter
This is the name for the partitioning map when used with the db2atld program executed in ANALYZE mode. An optimal partitioning map with even distribution across all database partitions is generated. If it is not specified and the running mode is ANALYZE, an error is returned.

TRACE Parameter
The number of records to trace when you need to review a dump of all of the data conversion process and output of hashing values is determined by the TRACE parameter. The default is 0 (no tracing).

NEWLINE Parameter
The parameter NEWLINE defines the character used to delimit each record in the data file. This parameter is only meaningful if the input data file is a fixed-length ASC file with each record delimited by a new line character, and the RECLEN option of the MODIFIED BY clause in the LOAD command is specified. If YES, AutoLoader always checks if the record is terminated by a new line character or not. It also checks if the record length is the same as the expected RECLEN or not. The default for this parameter is NO.

Planning to Use the AutoLoader Utility

The AutoLoader utility is invoked using db2atld. A sample configuration file, autoloader.cfg, is found under the autoloader sub-directory of the samples sub-directory of the sqllib sub-directory. The db2atld uses a configuration file to carry out the AutoLoader actions. It is recommended that you copy, rename, and modify the sample configuration file to establish the configuation you wish to use to split and load your data.

Before using the AutoLoader utility, you should:

  1. Create a temporary working directory and copy your configuration file into it. This directory must be accessible from all the participating split database partitions. It is from this directory that you will use the AutoLoader utility.

  2. Ensure the SVCENAME parameter of your database manager configuration, and the DB2COMM profile registry variable, are set up correctly. This is important since the AutoLoader utility makes remote database connections from the working partition (where you run the utility) to the database partitions where the table is defined.

  3. Modify your configuration file according to the directions included in the sample file.

Running the AutoLoader Utility

The AutoLoader utility is executed by typing the following command:

    db2atld [-c config_file]
    

with one or more of the following options:

-c
Uses the config_file specified as the configuration file for the AutoLoader utility. If not specifically mentioned, the default is 'autoloader.cfg'.

The AutoLoader utility creates several log and messages files. See "Additional Options and Considerations" and the LOGFILE parameter for details on the generated files.

AutoLoader Hints and Tips

There are some items you should consider before using the AutoLoader utility:

AutoLoader Sample Job

Using a small amount of data, we can see the output produced from the AutoLoader utility.

In this example, the following configuration file was used:

###############
# release level
###############
RELEASE=V5.01
 
##################
# CLP load command
##################
db2 load from /home/user/atld_work/test.dat of del replace into user.test
###############
# database name
###############
database=wsdb
 
#################
# split partition list
#################
SPLIT_NODES=(0,2)
##############
# running mode
##############
mode=split_and_load
 
################
# log file token
################
logfile=mylog
######################################
# frequency of progressive information
#
# print out progressive info every 10
# mega-bytes of data
######################################
STATUS_INTERVAL=10

The following output was generated:

/home/user/atld_work/ $ db2atld -c sample.atld.cfg
Utility program: "db2atld". Version: "05010".
Start reading autoloader configuration file: sample.atld.cfg
Finish reading autoloader configuration file: sample.atld.cfg
Start initializing autoloader process.
Finish initializing autoloader process.
The Autoloader is now issuing all LOAD requests.
The LOAD operation has begun on partition "0".
The LOAD operation has begun on partition "1".
The LOAD operation has begun on partition "2".
The LOAD operation has begun on partition "3".
The Autoloader is now issuing all split requests.
Start db2split on node "0" in background.
Start db2split on node "2" in background.
The utility has read "10" megabytes from the source data.
The utility has read "20" megabytes from the source data.
The utility has read "30" megabytes from the source data.
The utility has read "40" megabytes from the source data.
The utility has read "50" megabytes from the source data.
The utility has read "60" megabytes from the source data.
The utility has read "70" megabytes from the source data.
The utility has read "80" megabytes from the source data.
The utility has read "90" megabytes from the source data.
The utility has read "100" megabytes from the source data.
The utility has read "110" megabytes from the source data.
The utility has read "120" megabytes from the source data.
The utility has read "130" megabytes from the source data.
The utility has completed reading "130" megabytes from the user data.
The Autoloader is waiting for all splitters to complete.
The Autoloader is waiting for all LOAD operations to complete.
The remote execution of the splitter utility on partition "2"
finished with remote execution code "0".
The remote execution of the splitter utility on partition "0"
finished with remote execution code "0".
  Operation      Node     SQL Code     Result
___________________________________________________________
  LOAD           000      +00000000    Success.
___________________________________________________________
  LOAD           001      +00000000    Success.
___________________________________________________________
  LOAD           002      +00000000    Success.
___________________________________________________________
  LOAD           003      +00000000    Success.
___________________________________________________________
  SPLIT          000      +00000000    Success.
___________________________________________________________
  SPLIT          002      +00000000    Success.
___________________________________________________________
  PSPLIT         000      +00000000    Success.
___________________________________________________________
  RESULTS:       4 of 4 LOADs completed successfully.
___________________________________________________________
Rows Read      1310848
Rows Skipped   0
Rows Loaded    1310848
Rows Rejected  0
Rows Deleted   0
Rows Committed 1310848

The first line of the example shows the path and the temporary working directory accessible from each of the participating database partitions. The name of the configuration file to be used is also shown: in this case sample.atld.cfg.

The main body of the messages generated by the AutoLoader process show the initialization of the participating database partitions. The progress of the data through the split and load process is shown. Next, the termination of the AutoLoader processes is recorded.

A summary table of the operations, the partitions being used, any SQL codes that are returned, and some statement regarding the results of the operation, is presented as one of the outputs from the utility. If the SQL Code column records anything other than a zero (0) return code, a review of the message file will show the specific warnings or errors that were recorded.

Finally, a summary list of the records that were part of the AutoLoader job is shown.

Loading into Multiple Database Partitions

If you are loading data into a table in a multiple database partition nodegroup, the LOAD utility requires that the files that are to be loaded were split and contain the correct header information. The LOAD utility verifies the header information that the split operation of AutoLoader writes to each data file to ensure that the data goes to the correct location.

If you are loading data into a table in a single database partition nodegroup, the files do not have to be split, even if the table is defined to have a partitioning key. In this situation, you would specify the NOHEADER option of the LOAD utility.

The LOAD utility checks that the partitioning map used by the split operation of AutoLoader is the same one specified when the table is being loaded. If not, an error is returned. It also checks that the file partition is loaded at the correct database partition, and that the data types of the partitioning key columns specified during splitting match the current definition in the catalog. The nodegroup to which the table is loaded cannot be redistributed between the time that the data file is partitioned and the time that the parts are loaded into the corresponding database table. If redistribution has been done, the utility cannot load the partitioned data.

LOAD supports the following flat file formats:

However, AutoLoader can only be used to split DEL and ASC files.
Note:IXF files cannot be split, but can be loaded into a single database partition nodegroup using the 'NOHEADER' option in the LOAD command.

When the ROWCOUNT clause is used on the LOAD command in the AutoLoader process, ROWCOUNT is not supported and an error is returned. The use of the ROWCOUNT clause is only valid in a non-partitioned database environment.

There is a special consideration when using AutoLoader with a LOAD command that uses a DUMPFILE specification on the MODIFIED BY clause. If a column that is part of the partitioning key is invalid or rejected, all of the data associated with that row is not loaded. The row is not placed in the DUMPFILE location as you might expect; however, a record of the row causing the problem is placed in the splitter log file. You should check the splitter log file after the completion of the AutoLoader process since the DUMPFILE location may not have all rejected records that were not loaded.

See "AutoLoader Troubleshooting" for more information on potential problems.

AutoLoader Troubleshooting

  1. If it appears that db2atld is hanging, you can:
  2. If db2atld still fails you can try the following to diagnose the problem:

When working with the db2atld utility, you may issue:

   db2atld -h

which will show a -d option. This option is reserved for future use.

Migration and Backward Compatibility

There are a few migration and backward compatibility issues associated with this latest version of the AutoLoader utility.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]