IBM Books

Administration Guide


Collecting Statistics Using the RUNSTATS Utility

The RUNSTATS utility updates statistics in the system catalog tables to help with the query optimization process. Without these statistics, the database manager could make a decision that would adversely affect the performance of an SQL statement. The RUNSTATS utility allows you to collect statistics on the data contained in the tables, indexes, or both tables and indexes.

Use the RUNSTATS utility to collect statistics based on both the table and the index data to provide accurate information to the access plan selection process in the following situations:

When you are working in a partitioned database, collect the statistics related to a table and its indexes by executing the RUNSTATS operation at a single node. (The node at which the utility executes is determined by whether the node at which you issue the command contains table data or not. See "The Database Partition Where RUNSTATS is Executed" for details.) Because the statistics stored in the catalogs are supposed to represent table-level information, the node-level statistics collected by the database manager are multiplied where appropriate by the number of nodes across which the table is partitioned. This provides an approximation of the actual statistics that would be collected by executing RUNSTATS at every node and aggregating these statistics.
Note:The DB2 query optimizer assumes that attribute values (data) are placed equally and evenly across the database partitions of the system. If the placement of data is not equal, you should run this command on a database partition that you think has a representative table distribution.

The Database Partition Where RUNSTATS is Executed

When you invoke RUNSTATS on a table, you must be connected to the database in which the table is stored, but the database partition from which you issue the command does not have to contain a partition for this table:

Analyzing Statistics

Analyzing the statistics can indicate when reorganization is necessary. Some of these indications are:

RUNSTATS can also help you determine how performance is related to changes in your database. The statistics show the data distribution within a table. When used routinely, RUNSTATS provides data about tables and indexes over a period of time, thereby allowing performance trends to be identified for your data model as it evolves over time.

Ideally, you should rebind application programs after running statistics, because the query optimizer may choose a different access plan given the new statistics.

If you do not have enough time available to collect all of the statistics at one time, you may choose to periodically run RUNSTATS to update only a portion of the statistics that could be gathered. If inconsistencies are found as a result of activity on the table between the periods where you run RUNSTATS with a selective partial update, then a warning message (SQL0437W, reason code 6) is issued. For example, you first use RUNSTATS to gather table distribution statistics. Subsequently, you use RUNSTATS to gather index statistics. If inconsistencies are detected as a result of activity on the table, then the table distribution statistics are dropped and the warning message is issued. It is recommended that you run RUNSTATS to gather table distribution statistics when this happens.

You should periodically use RUNSTATS to gather both table and index statistics at once, to ensure that the index statistics are synchronized with the table statistics. Index statistics retain most of the table and column statistics collected from the last run of RUNSTATS. If the table has been modified extensively since the last time its table statistics were gathered, gathering only the index statistics for that table will leave the two sets of statistics out of synchronization.

You may wish to collect statistics based only on index data in the following situations:

The RUNSTATS utility allows you to collect varying levels of statistics. For tables, you can collect basic level statistics or you can also collect distribution statistics for the column values within a table (see "Collecting and Using Distribution Statistics"). For indexes, you can collect basic level statistics or you can also collect detailed statistics which can help the optimizer better estimate the I/O cost of an index scan. (See "Clustered Indexes" for information about these "detailed" statistics).
Note:Statistics are not collected for LONG or large object (LOB) columns. For row types, the table level statistics NPAGES, FPAGES, and OVERFLOW are not collected for a sub-table.

The following tables show the catalog statistics that are updated by the RUNSTATS utility:

Table 39. Table Statistics (SYSCAT.TABLES and SYSSTAT.TABLES)
Statistic Description RUNSTATS Option
Table Indexes
FPAGES number of pages being used by a table Yes Yes
NPAGES number of pages containing rows Yes Yes
OVERFLOW number of rows that overflow Yes No
CARD number of rows in table (cardinality) Yes Yes (Note 2)
Note:
  1. For a partitioned database, the values for each statistic are estimated from the value of the count at the database partition multiplied by the number of database partitions.
  2. If the table has no indices defined and you request statistics for indexes, no new CARD statistics are updated. The previous CARD statistics are retained.

Table 40. Column Statistics (SYSCAT.COLUMNS and SYSSTAT.COLUMNS)
Statistic Description RUNSTATS Option
Table Indexes
COLCARD column cardinality Yes (Note 1) Yes (Note 2)
AVGCOLLEN average length of column Yes Yes (Note 2)
HIGH2KEY second highest value in column Yes Yes (Note 2)
LOW2KEY second lowest value in column Yes Yes (Note 2)
Note:
  1. COLCARD is estimated for all columns in the table. In a partitioned database, if the column is the single-column partitioning key for the table, the value of the count is estimated as the count at the database partition multiplied by the number of database partitions.
  2. Column statistics are gathered for the first column in the index key.

Table 41. Index Statistics (SYSCAT.INDEXES and SYSSTAT.INDEXES)
Statistic Description RUNSTATS Option
Table Indexes
NLEAF number of index leaf pages No Yes (Note 3)
NLEVELS number of index levels No Yes
CLUSTERRATIO degree of clustering of table data No Yes (Note 2)
CLUSTERFACTOR finer degree of clustering No Detailed (Notes 1,2)
DENSITY Ratio (percentage) of SEQUENTIAL_PAGES to number of pages in the range of pages occupied by the index (Note 4) No Yes
FIRSTKEYCARD number of distinct values in first column of the index No Yes (Note 3)
FIRST2KEYCARD number of distinct values in first two columns of the index No Yes (Note 3)
FIRST3KEYCARD number of distinct values in first three columns of the index No Yes (Note 3)
FIRST4KEYCARD number of distinct values in first four columns of the index No Yes (Note 3)
FULLKEYCARD number of distinct values in all columns of the index No Yes (Note 3)
PAGE_FETCH_PAIRS page fetch estimates for different buffer sizes No Detailed (Notes 1,2)
SEQUENTIAL_PAGES number of leaf pages located on disk in index key order, with few or no large gaps between them No Yes
Note:
  1. Detailed index statistics are gathered by specifying the DETAILED clause on the RUNSTATS command, or by specifying A, Y or X for the statsopt parameter when calling the RUNSTATS API.
  2. CLUSTER_FACTOR and PAGE_FETCH_PAIRS are not collected with the DETAILED clause unless the table is of a respectable size. If the table is greater than about 25 pages, then CLUSTERFACTOR and PAGE_FETCH_PAIRS statistics are collected. In this case, CLUSTERRATIO is -1 (not collected). If the table is a relatively small table, only CLUSTERRATIO is filled in by RUNSTATS while CLUSTERFACTOR and PAGE_FETCH_PAIRS are not. If the DETAILED clause is not specified, only the CLUSTERRATIO statistic is collected.
  3. For a partitioned database, the value is estimated from the value of the count at the database partition multiplied by the number of database partitions.
  4. This statistic measures the percentage of pages in the file containing the index that belongs to that table. For a table having only one index defined on it, DENSITY should normally be 100. DENSITY is used by the optimizer to estimate how many irrelevant pages from other indexes might be read, on average, if the index pages were prefetched.

Table 42. Column Distribution Statistics (SYSCAT.COLDIST and SYSSTAT.COLDIST)
Statistic Description RUNSTATS Option
Table Indexes
DISTCOUNT If TYPE is Q, the number of distinct values that are less than or equal to COLVALUE statistics Distribution (Note 2) No
TYPE Indicator of whether row provides frequent-value or quantile statistics Distribution No
SEQNO Frequency ranking of a sequence number to help uniquely identify the row in the table Distribution No
COLVALUE Data value for which frequency or quantile statistic is collected Distribution No
VALCOUNT Frequency with which the data value occurs in column, or for quantiles, the number of values less than or equal to the data value (COLVALUE) Distribution No
Note:
  1. Column distribution statistics are gathered by specifying the WITH DISTRIBUTION clause on the RUNSTATS command, or by specifying A, D or Y for the statsopt parameter when calling the RUNSTATS API. Note that distribution statistics may not be gathered unless there is a sufficient lack of uniformity in the column values.
  2. DISTCOUNT is collected only for columns that are the first key column in an index.
  3. In a partitioned database, VALCOUNT is the estimated value of the count at the database partition multiplied by the number of database partitions. The exception to this is where the TYPE is 'F' and the column is the single-column partitioning key of the table, in which case VALCOUNT is simply the count at the database partition.

For more information about column distribution statistics, see "Collecting and Using Distribution Statistics".

Statistics for user-defined functions are not collected by the RUNSTATS utility. You must manually update the statistics for these functions. See "User Update-Capable Catalog Statistics" and "Updating Statistics for User-Defined Functions".


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]