Administration Guide

User Update-Capable Catalog Statistics

The ability to update selected system catalog statistics allows you to:

Model query performance on a development system using production system statistics
Perform "what if" query performance analysis.

You should not update statistics on a production system because you may hinder the optimizer from finding the best access plan for your query.

To update the values of these statistical columns, use the SQL UPDATE statement against the views defined in the SYSSTAT schema. You can update statistics for:

Tables for which you hold explicit CONTROL privilege. You can also update statistics for columns and indexes for these tables.
User-defined functions (UDFs) that you own (see "Updating Statistics for User-Defined Functions" for guidance).

You can also update these statistics if your user ID has explicit DBADM authority for the database; that is, your user ID is recorded as having DBADM authority in the SYSCAT.DBAUTH table. Belonging to a DBADM group does not explicitly provide this authority.

Using these views, a DBADM can see statistics rows for all users. A user without DBADM authority can only see those rows which contain statistics for objects over which they have CONTROL privilege.

The following shows an example of updating the table statistics for the EMPLOYEE table:

   UPDATE SYSSTAT.TABLES
   SET  CARD   = 10000,
        NPAGES = 1000,
        FPAGES = 1000,
        OVERFLOW = 2
   WHERE TABSCHEMA = 'userid'
     AND TABNAME   = 'EMPLOYEE'

You must be careful when updating catalog statistics. Arbitrary updates can have a serious impact on the performance of subsequent queries. You may wish to use any of the following methods to replace any updates you applied to these tables:

ROLLBACK the unit of work in which the changes have been made (assuming the unit of work has not been committed).
Using the RUNSTATS utility you can recalculate and refresh the catalog statistics.
Update the catalog statistics to indicate that statistics have not been gathered. (For example, setting column NPAGES to -1 indicates that the number-of-pages statistic has not been collected.)
Replace the catalog statistics with the data they contained prior to your update. This method would only be possible if you used the db2look tool, as described in "Modelling Production Databases", to capture the statistics before you made any changes.

In a some cases, the optimizer may determine that some particular statistical value or combination of values are not valid, it will use default values and issue a warning. Such circumstances are rare, however, since most of the validation is done when updating the statistics.

Additional Information: For information about updating catalog statistics, see:

"Rules for Updating Catalog Statistics"
"Rules for Updating Table Statistics"
"Rules for Updating Column Statistics"
"Rules for Updating Distribution Statistics for Columns"
"Rules for Updating Index Statistics"
"Updating Statistics for User-Defined Functions"
"Modelling Production Databases"

Rules for Updating Catalog Statistics

When you update catalog statistics, the most important general rule is to ensure that valid values, ranges, and formats of the various statistics are stored in the statistic views. It is also important to preserve the consistency of relationships between various statistics.

For example, COLCARD in SYSSTAT.COLUMNS must be less than CARD in SYSSTAT.TABLES (the number of distinct values in a column can't be greater than the number of rows). Assume that you want to reduce COLCARD from 100 to 25, and CARD from 200 to 50. If you update SYSCAT.TABLES first, you should get an error (since CARD would be less than COLCARD). The correct order is to update COLCARD in SYSCAT.COLUMNS first, then update CARD in SYSSTAT.TABLES. The situation occurs in reverse if you want to increase COLCARD to 250 from 100, and CARD to 300 from 200. In this case, you must update CARD first, then COLCARD.

When a conflict is detected between an updated statistic and another statistic, an error is issued. However, errors may not always be issued when conflicts arise. In some situations, the conflict is difficult to detect and report in an error, especially if the two related statistics are in different catalogs. For this reason, you should be careful to avoid causing such conflicts.

The most common checks you should make, before updating a catalog statistic, are:

Numeric statistics must be -1 or greater than or equal to zero.
Numeric statistics representing percentages (for example, CLUSTERRATIO in SYSSTAT.INDEXES) must be between 0 and 100.

Note: For row types, the table level statistics NPAGES, FPAGES, and OVERFLOW are not updatable for a sub-table.

Rules for Updating Table Statistics

There are only four statistic values that you can update in sysstat.tables: CARD, FPAGES, NPAGES, and OVERFLOW. Keep in mind that:

CARD must be greater than all COLCARD values in SYSSTAT.COLUMNS that correspond to that table.
CARD must be greater than NPAGES.
FPAGES must be greater than NPAGES.
NPAGES must be less than or equal to any "Fetch" value in the PAGE_FETCH_PAIRS column of any index (assuming this statistic is relevant for the index).
CARD must not be less than or equal to any "Fetch" value in the PAGE_FETCH_PAIRS column of any index (assuming this statistic is relevant for the index).

Rules for Updating Column Statistics

When you are updating statistics in SYSSTAT.COLUMNS, follow the guidelines below. For details on updating column distribution statistics, see "Rules for Updating Distribution Statistics for Columns".

HIGH2KEY and LOW2KEY (in SYSSTAT.COLUMNS) must adhere to the following rules:
- The datatype of any HIGH2KEY, LOW2KEY value must correspond to the datatype of the user column for which the statistic is attributed. Because HIGH2KEY is a VARCHAR column, you must enclose the value in quotation marks. For example, to set HIGH2KEY to 25 for an INTEGER user column, your update statement would include SET HIGH2KEY = '25'.
- The length of HIGH2KEY, LOW2KEY values must be the smaller of 33 or the maximum length of the target column's datatype
- HIGH2KEY must be greater than LOW2KEY whenever there are 3 or more distinct values in the corresponding column. In the case of less than 3 distinct values in the column, HIGH2KEY can be equal to LOW2KEY.
The cardinality of a column (COLCARD statistic in SYSSTAT.COLUMNS) cannot be greater than the cardinality of its corresponding table (CARD statistic in SYSSTAT.TABLES).
No statistics are supported for columns with datatypes: LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, DBCLOB.

Rules for Updating Distribution Statistics for Columns

"User Update-Capable Catalog Statistics" provides general information about how to update catalog statistics. You may wish to refer to that section before attempting to update column distribution statistics.

In order for all the statistics in the catalog to be consistent, you must exercise care when updating the distribution statistics. Specifically, for each column, the catalog entries for the frequent data statistics and quantiles must satisfy the following constraints:

Frequent value statistics (in the SYSSTAT.COLDIST catalog)
- The values in column VALCOUNT must be non-increasing for increasing values of SEQNO
- The number of values in column COLVALUE must be less than or equal to the number of distinct values in the column, which is stored in column COLCARD in catalog view SYSSTAT.COLUMNS.
- The sum of the values in column VALCOUNT must be less than or equal to the number of rows in the column, which is stored in column CARD in catalog view SYSSTAT.TABLES
- In most cases, the values in the column COLVALUE should lie between the second-highest and second-lowest data values for the column, which are stored in columns HIGH2KEY and LOW2KEY, respectively, in catalog view SYSSTAT.COLUMNS. There may be one frequent value greater than HIGH2KEY and one frequent value less than LOW2KEY.
Quantiles (in the SYSSTAT.COLDIST catalog)
- The values in column COLVALUE must be non-decreasing for increasing values of SEQNO
- The values in column VALCOUNT must be strictly increasing for increasing values of SEQNO
- The largest value in column COLVALUE must have a corresponding entry in column VALCOUNT equal to the number of rows in the column
- In most cases, the values in the column COLVALUE should lie between the second-highest and second-lowest data values for the column, which are stored in columns HIGH2KEY and LOW2KEY, respectively, in catalog view SYSSTAT.COLUMNS.

Suppose that distribution statistics are available for a column C1 with "R" rows and you wish to modify the statistics to correspond to a column with the same relative proportions of data values, but with "(F x R)" rows. To scale up the frequent-value statistics by a factor of F, each entry in column VALCOUNT must be multiplied by F. Similarly, to scale up the quantiles by a factor of F, each entry in column VALCOUNT must be multiplied by F. If these rules are not followed, the optimizer may use the wrong filter factor causing unpredictable performance when you run the query.

Rules for Updating Index Statistics

When you update the statistics in SYSSTAT.INDEXES, follow the rules described below:

PAGE_FETCH_PAIRS (in SYSSTAT. INDEXES) must adhere to the following rules:
- Individual values in the PAGE_FETCH_PAIRS statistic must be separated by a series of blank delimiters.
- Individual values in the PAGE_FETCH_PAIRS statistic must not be longer than 10 digits and must be less than the maximum integer value (MAXINT = 2147483647).
- There must always be a valid PAGE_FETCH_PAIRS value if the CLUSTERFACTOR is greater than zero.
- There must be exactly 11 pairs in a single PAGE_FETCH_PAIR statistic.
- Buffer size entries of PAGE_FETCH_PAIRS must be ascending in value.
- If the buffer size value is the same as that in the previous pair, the page fetch value must be the same as that in the previous pair.
- Any buffer size value in a PAGE_FETCH_PAIRS entry cannot be greater than MIN(NPAGES, 524287) where NPAGES is the number of pages in the corresponding table (in SYSSTAT.TABLES).
- "Fetches" entries of PAGE_FETCH_PAIRS must be descending in value, with no individual "Fetches" entry being less than NPAGES. "Fetch" size values in a PAGE_FETCH_PAIRS entry cannot be greater than the CARD (cardinality) statistic of the corresponding table.
- If buffer size value is the same in two consecutive pairs, then page fetch value must also be the same in both the pairs (in SYSSTAT.TABLES).
A valid PAGE_FETCH_UPDATE is:
```
    PAGE_FETCH_PAIRS =
     '100 380 120 360 140 340 160 330 180 320 200 310 220 305 240 300
      260 300 280 300 300 300'
```
where
```
    NPAGES = 300
    CARD   = 10000
    CLUSTERRATIO  =  -1
    CLUSTERFACTOR = 0.9
```
CLUSTERRATIO and CLUSTERFACTOR (in SYSSTAT.INDEXES) must adhere to the following rules:
- Valid values for CLUSTERRATIO are -1 or between 0 and 100.
- Valid values for CLUSTERFACTOR are -1 or between 0 and 1.
- At least one of the CLUSTERRATIO and CLUSTERFACTOR values must be -1 at all times.
- If CLUSTERFACTOR is a positive value, it must be accompanied by a valid PAGE_FETCH_PAIR statistic.
The following rules apply to FIRSTKEYCARD, FIRST2KEYCARD, FIRST3KEYCARD, FIRST4KEYCARD, and FULLKEYCARD:
- FIRSTKEYCARD must be equal to FULLKEYCARD for a single-column index.
- FIRSTKEYCARD must be equal to COLCARD for the corresponding column.
- If any of these index statistics are not relevant, you should set them to -1. For example, if you have an index with only 3 columns, set FIRST4KEYCARD to -1.
- For multiple column indexes, if all the statistics are relevant, the relationship between them must be:
```
FIRSTKEYCARD <= FIRST2KEYCARD <= FIRST3KEYCARD <= FIRST4KEYCARD
   <= FULLKEYCARD <= CARD
```
The following rules apply to SEQUENTIAL_PAGES and DENSITY:
- Valid values for SEQUENTIAL_PAGES are -1 or between 0 and NLEAF.
- Valid values for DENSITY are -1 or between 0 and 100.

Updating Statistics for User-Defined Functions

Using the SYSSTAT.FUNCTIONS catalog view, you may update statistics for user-defined functions (UDFs). If these statistics are available, the optimizer will use them when estimating costs for various access plans. If statistics are not available the statistic column values will be -1 and the optimizer will use default values that assume a simple UDF.

The following table provides information about the statistic columns that you may update for UDFs:

Table 43. Function Statistics (SYSCAT.FUNCTIONS and SYSSTAT.FUNCTIONS)

Statistic Description
IOS_PER_INVOC Estimated number of read/write requests executed each time a function is executed.
INSTS_PER_INVOC Estimated number of machine instructions executed each time a function is executed.
IOS_PER_ARGBYTE Estimated number of read/write requests executed per input argument byte.
INSTS_PER_ARGBYTES Estimated number of machine instructions executed per input argument byte.
PERCENT_ARGBYTES Estimated average percent of input argument bytes that the function will actually process.
INITIAL_IOS Estimated number of read/write requests executed only the first/last time the function is invoked.
INITIAL_INSTS Estimated number of machine instructions executed only the first/last time the function is invoked.
CARDINALITY Estimated number of rows generated by a table function.

Statistic	Description
IOS_PER_INVOC	Estimated number of read/write requests executed each time a function is executed.
INSTS_PER_INVOC	Estimated number of machine instructions executed each time a function is executed.
IOS_PER_ARGBYTE	Estimated number of read/write requests executed per input argument byte.
INSTS_PER_ARGBYTES	Estimated number of machine instructions executed per input argument byte.
PERCENT_ARGBYTES	Estimated average percent of input argument bytes that the function will actually process.
INITIAL_IOS	Estimated number of read/write requests executed only the first/last time the function is invoked.
INITIAL_INSTS	Estimated number of machine instructions executed only the first/last time the function is invoked.
CARDINALITY	Estimated number of rows generated by a table function.

For example, consider a UDF (EU_SHOE) that converts an American shoe size to the equivalent European shoe size. (These two shoe sizes could be UDTs.) For this UDF, you should set the statistic columns as follows:

INSTS_PER_INVOC should be set to the estimated number of machine instructions required to:
- Invoke EU_SHOE
- Initialize the output string
- Return the result.
INSTS_PER_ARGBYTE should be set to the estimated number of machine instructions required to convert the input string into a European shoe size.
PERCENT_ARGBYTES would be set to 100 indicating that the entire input string is to be converted
INITIAL_INSTS, IOS_PER_INVOC, IOS_PER_ARGBYTE, and INITIAL_IOS should all be set to 0, since this UDF only performs computations.

PERCENT_ARGBYTES would be used by a function that does not always process the entire input string. For example, consider a UDF (LOCATE) that accepts two arguments as input and returns the starting position of the first occurrence of the first argument within the second argument. Assume that the length of the first argument is small enough to be insignificant relative to the second argument and, on average, 75 percent of the second argument is searched. Based on this information, PERCENT_ARGBYTES should be set to 75. The above estimate of the average of 75 percent is based on the following additional assumptions:

Half the time the first argument will not be found resulting in the entire second argument being searched
The first argument is equally likely to appear anywhere within the second argument, resulting in half of the second argument being searched (on average) when the first argument is found.

INITIAL_INSTS or INITIAL_IOS can be used to record the estimated number of machine instructions or read/write requests performed only the first or last time the function is invoked. This could be used, for example, to record the cost of setting up a scratchpad area.

To obtain information about I/Os and instructions used by a user-defined function, you can use output provided by your programming language compiler or by monitoring tools available for your operating system.

Modelling Production Databases

Sometimes you may wish to have your test system contain a subset of your production system's data. However, access plans selected for such a test system are not necessarily the same as those that would be selected on the production system, unless the catalog statistics and the configuration parameters for the test system are updated to match those of the production system.

A productivity tool, db2look, is provided that can be run against the production database to generate the update statements required to make the catalog statistics of the test database match those in production. These update statements can be generated by using db2look in mimic mode (-m option). In this case, db2look will generate a command processor script containing all the statements required to mimic the catalog statistics of the production database.

You can recreate database data objects, including tables and indexes, by extracting DDL statements with db2look -e. You can run the command processor script created from this command against another database to recreate the database. You can use the -e option with the -m option.

After running the update statements produced by db2look, against the test system, the test system can be used to validate the access plans to be generated in production. Since the optimizer uses the type and configuration of the table spaces to estimate I/O costs, the test system must have the same table space geometry or layout. That is, the same number of containers of the same type: either SMS or DMS.

For more information on how to use this productivity tool, type the following on a command line:

    db2look -h

You can also see the Command Reference manual.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]