The ability to update selected system catalog statistics allows you to:
You should not update statistics on a production system because you may hinder the optimizer from finding the best access plan for your query.
To update the values of these statistical columns, use the SQL UPDATE statement against the views defined in the SYSSTAT schema. You can update statistics for:
You can also update these statistics if your user ID has explicit DBADM authority for the database; that is, your user ID is recorded as having DBADM authority in the SYSCAT.DBAUTH table. Belonging to a DBADM group does not explicitly provide this authority.
Using these views, a DBADM can see statistics rows for all users. A user without DBADM authority can only see those rows which contain statistics for objects over which they have CONTROL privilege.
The following shows an example of updating the table statistics for the EMPLOYEE table:
UPDATE SYSSTAT.TABLES SET CARD = 10000, NPAGES = 1000, FPAGES = 1000, OVERFLOW = 2 WHERE TABSCHEMA = 'userid' AND TABNAME = 'EMPLOYEE'
You must be careful when updating catalog statistics. Arbitrary updates can have a serious impact on the performance of subsequent queries. You may wish to use any of the following methods to replace any updates you applied to these tables:
In a some cases, the optimizer may determine that some particular statistical value or combination of values are not valid, it will use default values and issue a warning. Such circumstances are rare, however, since most of the validation is done when updating the statistics.
Additional Information: For information about updating catalog statistics, see:
When you update catalog statistics, the most important general rule is to ensure that valid values, ranges, and formats of the various statistics are stored in the statistic views. It is also important to preserve the consistency of relationships between various statistics.
For example, COLCARD in SYSSTAT.COLUMNS must be less than CARD in SYSSTAT.TABLES (the number of distinct values in a column can't be greater than the number of rows). Assume that you want to reduce COLCARD from 100 to 25, and CARD from 200 to 50. If you update SYSCAT.TABLES first, you should get an error (since CARD would be less than COLCARD). The correct order is to update COLCARD in SYSCAT.COLUMNS first, then update CARD in SYSSTAT.TABLES. The situation occurs in reverse if you want to increase COLCARD to 250 from 100, and CARD to 300 from 200. In this case, you must update CARD first, then COLCARD.
When a conflict is detected between an updated statistic and another statistic, an error is issued. However, errors may not always be issued when conflicts arise. In some situations, the conflict is difficult to detect and report in an error, especially if the two related statistics are in different catalogs. For this reason, you should be careful to avoid causing such conflicts.
The most common checks you should make, before updating a catalog statistic, are:
Note: | For row types, the table level statistics NPAGES, FPAGES, and OVERFLOW are not updatable for a sub-table. |
There are only four statistic values that you can update in sysstat.tables: CARD, FPAGES, NPAGES, and OVERFLOW. Keep in mind that:
When you are updating statistics in SYSSTAT.COLUMNS, follow the guidelines below. For details on updating column distribution statistics, see "Rules for Updating Distribution Statistics for Columns".
"User Update-Capable Catalog Statistics" provides general information about how to update catalog statistics. You may wish to refer to that section before attempting to update column distribution statistics.
In order for all the statistics in the catalog to be consistent, you must exercise care when updating the distribution statistics. Specifically, for each column, the catalog entries for the frequent data statistics and quantiles must satisfy the following constraints:
Suppose that distribution statistics are available for a column C1 with "R" rows and you wish to modify the statistics to correspond to a column with the same relative proportions of data values, but with "(F x R)" rows. To scale up the frequent-value statistics by a factor of F, each entry in column VALCOUNT must be multiplied by F. Similarly, to scale up the quantiles by a factor of F, each entry in column VALCOUNT must be multiplied by F. If these rules are not followed, the optimizer may use the wrong filter factor causing unpredictable performance when you run the query.
When you update the statistics in SYSSTAT.INDEXES, follow the rules described below:
A valid PAGE_FETCH_UPDATE is:
PAGE_FETCH_PAIRS = '100 380 120 360 140 340 160 330 180 320 200 310 220 305 240 300 260 300 280 300 300 300'
where
NPAGES = 300 CARD = 10000 CLUSTERRATIO = -1 CLUSTERFACTOR = 0.9
FIRSTKEYCARD <= FIRST2KEYCARD <= FIRST3KEYCARD <= FIRST4KEYCARD <= FULLKEYCARD <= CARD
Using the SYSSTAT.FUNCTIONS catalog view, you may update statistics for user-defined functions (UDFs). If these statistics are available, the optimizer will use them when estimating costs for various access plans. If statistics are not available the statistic column values will be -1 and the optimizer will use default values that assume a simple UDF.
The following table provides information about the statistic columns that
you may update for UDFs:
Table 43. Function Statistics (SYSCAT.FUNCTIONS and SYSSTAT.FUNCTIONS)
Statistic | Description |
---|---|
IOS_PER_INVOC | Estimated number of read/write requests executed each time a function is executed. |
INSTS_PER_INVOC | Estimated number of machine instructions executed each time a function is executed. |
IOS_PER_ARGBYTE | Estimated number of read/write requests executed per input argument byte. |
INSTS_PER_ARGBYTES | Estimated number of machine instructions executed per input argument byte. |
PERCENT_ARGBYTES | Estimated average percent of input argument bytes that the function will actually process. |
INITIAL_IOS | Estimated number of read/write requests executed only the first/last time the function is invoked. |
INITIAL_INSTS | Estimated number of machine instructions executed only the first/last time the function is invoked. |
CARDINALITY | Estimated number of rows generated by a table function. |
For example, consider a UDF (EU_SHOE) that converts an American shoe size to the equivalent European shoe size. (These two shoe sizes could be UDTs.) For this UDF, you should set the statistic columns as follows:
PERCENT_ARGBYTES would be used by a function that does not always process the entire input string. For example, consider a UDF (LOCATE) that accepts two arguments as input and returns the starting position of the first occurrence of the first argument within the second argument. Assume that the length of the first argument is small enough to be insignificant relative to the second argument and, on average, 75 percent of the second argument is searched. Based on this information, PERCENT_ARGBYTES should be set to 75. The above estimate of the average of 75 percent is based on the following additional assumptions:
INITIAL_INSTS or INITIAL_IOS can be used to record the estimated number of machine instructions or read/write requests performed only the first or last time the function is invoked. This could be used, for example, to record the cost of setting up a scratchpad area.
To obtain information about I/Os and instructions used by a user-defined function, you can use output provided by your programming language compiler or by monitoring tools available for your operating system.
Sometimes you may wish to have your test system contain a subset of your production system's data. However, access plans selected for such a test system are not necessarily the same as those that would be selected on the production system, unless the catalog statistics and the configuration parameters for the test system are updated to match those of the production system.
A productivity tool, db2look, is provided that can be run against the production database to generate the update statements required to make the catalog statistics of the test database match those in production. These update statements can be generated by using db2look in mimic mode (-m option). In this case, db2look will generate a command processor script containing all the statements required to mimic the catalog statistics of the production database.
You can recreate database data objects, including tables and indexes, by extracting DDL statements with db2look -e. You can run the command processor script created from this command against another database to recreate the database. You can use the -e option with the -m option.
After running the update statements produced by db2look, against the test system, the test system can be used to validate the access plans to be generated in production. Since the optimizer uses the type and configuration of the table spaces to estimate I/O costs, the test system must have the same table space geometry or layout. That is, the same number of containers of the same type: either SMS or DMS.
For more information on how to use this productivity tool, type the following on a command line:
db2look -h
You can also see the Command Reference manual.