Administration Guide

Indexing Impact on Query Optimization

It is important to remember that you do not decide when an index should be used; the database manager makes the decision based on the available table and index information. However, you play an important role in the process by creating the necessary indexes that can improve performance. It is also important for you to collect statistics about the indexes (using the RUNSTATS utility) after you create an index, or change the prefetch quantity (as mentioned above), and on an ongoing basis to keep the statistics up to date. This means you must understand the kinds of indexes that you can create and the ways to create them.

Indexing versus No Indexing

For each table referenced in a database query, if no index exists on the table, then a table scan must be performed on that table. The larger the table, the longer a table scan takes. A table scan occurs when the database manager sequentially accesses every row of a table. This can be compared to an index scan that occurs when the database manager accesses data using an index. (See "Index Scan Concepts".)

An index will be selected for use if the optimizer estimates that an index scan will be faster than a table scan. Index files generally are smaller and require less time to read than an entire table, particularly as tables grow larger. In addition, the entire index may not need to be scanned. The predicates applied to the index reduce the number of rows to be read from the data pages.

Each index entry consists of a search-key value and a pointer to the row containing that value. The values are arranged in ascending or descending order of the search-key value, which makes it possible to bracket the search, given the right predicates. An index can also be used to obtain rows in an ordered sequence, eliminating the need for the database manager to sort the rows after they are read from the table.

A unique index may contain include columns in addition to the search-key value and row pointer.
Note: You cannot control whether an index is used by the database manager. For example, the result of a query cannot be guaranteed to be produced in an ordered sequence simply by the existence of an index on the table being queried. The database manager may use this index during the processing of the query but is not required to. Only the existence of an ORDER BY clause can "guarantee" the order of a result set.

Indexes can reduce access time significantly; however, indexes can also have adverse effects on performance. Before creating indexes, consider the effects of multiple indexes on disk space and processing time:

Each index takes up a certain amount of storage or disk space. The exact amount is dependent on the size of the table and the size and number of columns included in the index.
Each INSERT or DELETE operation performed on a table requires additional updating of each index on that table. This is also true for each UPDATE operation that changes an index key.
The LOAD utility rebuilds any existing indexes.
The indexfreespace MODIFIED BY parameter can be specified on the LOAD command to override the index PCTFREE used when the index was created.
Each index potentially adds an alternative access path for a query, which the optimizer will consider, and therefore increases the query compilation time.

Indexes should be carefully chosen to address the needs of the application program.

To determine whether an index is used in a specific package you may use the SQL Explain facility, described in Chapter 14. "SQL Explain Facility".

Guidelines for Indexing

Which indexes should be created depends on the data and its intended uses. The following guidelines can help you determine which indexes would be most useful:

Define primary keys and unique keys, wherever they apply, by using the CREATE UNIQUE INDEX statement. (Refer to the SQL Reference for more information.) Unique indexes can help the optimizer avoid performing certain operations such as sorts.
Define unique indexes with include columns to improve the performance of data retrieval. Columns are good candidates for INCLUDE columns of unique indexes if they:
- Are accessed frequently and therefore would benefit from index-only access
- Are not required to limit the range of index scans
- Do not affect the ordering or uniqueness of the index key.
See "Creating an Index" for more information on INCLUDE columns.
Use indexes to optimize frequent queries to tables with more than a few data pages, as can be determined by the NPAGES column in the SYSCAT.TABLES catalog view:
- Create an index on any column you will use when joining tables.
- Create an index on any column from which you will be searching for particular values on a regular basis.
Avoid creating indexes that are partial keys of other index keys on the columns. For example, if there is an index on columns a, b, and c, then a second index on columns a and b is not generally useful.
Use indexes on foreign keys to improve performance of delete and update operations on the parent table.
Use indexes on columns that will frequently be used to sort the data.
In creating a multiple-column index, if you have more than one choice for the first key column, choose the one most often specified with the "=" predicate or specify the columns with the greatest number of distinct values first.
Creating indexes, arbitrarily on all columns, not only consumes much disk space, but also causes prepare times to be large. This will be particularly true for complex queries, against which an optimization class with dynamic programming join enumeration is used. (See "Adjusting the Optimization Class").
The following provides a rule-of-thumb for the typical number of indexes you will define for a table. This number is based on the primary use of your database:
- For online transaction processing (OLTP) environments, you should only have one or two indexes
- For query (read-only) environments, you could have more than five indexes
- For mixed query/OLTP environments, you could have between two and five indexes.

Consider defining a clustering index to help keep newly inserted rows clustered according to that index. A clustering index should significantly reduce the need for reorganizing the table.

Note: When a clustering index is defined, the table should be loaded with a free space reserved on each data page to allow inserts to take place on those pages. (Free space is reserved by using the PCTFREE keyword on the ALTER TABLE statement; or, the pagefreespace MODIFIED BY clause of the LOAD command.)

Consider using the PCTFREE keyword when creating indexes. PCTFREE reserves space on index pages for future updates to the index. This may reduce the frequency of page splits and increase performance.

The following are typical circumstances in which creating an index can improve performance:

An index can be created on columns that are used in WHERE clauses of the queries and transactions that are most frequently processed.
The WHERE clause:
```
   WHERE WORKDEPT='A01' OR WORKDEPT='E21'
```
will generally benefit from an index on WORKDEPT, unless those values occur frequently.
An index can be created on a column or columns to order the rows in collating sequence. Ordering is required not only in the ORDER BY clause, but also by other features, such as the DISTINCT and GROUP BY clauses.
The following example uses the DISTINCT clause:
```
   SELECT DISTINCT WORKDEPT
     FROM EMPLOYEE
```
The database manager can use an index defined for ascending or descending order on WORKDEPT to eliminate duplicate values. This same index could also be used to group values in the following example with a GROUP BY clause:
```
   SELECT WORKDEPT, AVERAGE(SALARY)
     FROM EMPLOYEE
   GROUP BY WORKDEPT
```
An index can be created to name each column that is referenced in a statement. When an index is specified in this way, the resulting index-only access means data can be retrieved more efficiently by avoiding table access.
For example, assume the following SQL statement is issued:
```
   SELECT LASTNAME
     FROM EMPLOYEE
     WHERE WORKDEPT IN ('A00','D11','D21')
```
If an index is defined for the WORKDEPT and LASTNAME columns of the EMPLOYEE table, the statement might be processed more efficiently by scanning the index than by scanning the entire table. Note that since the predicate is on WORKDEPT, this column should be the first column of the index.
Include columns on an index is another way to improve the use of indexes on tables. Using the previous example, you could define unique index as:
```
   CREATE UNIQUE INDEX x ON employee (workdept) INCLUDE (lastname)
```
Specifying lastname as an include column rather than as part of the index key means that lastname is stored only on the leaf pages of the index.

Performance Tips for Administering Indexes

The following can help you understand how performance can be impacted by properly using and managing indexes:

Index Creation
When creating indexes on large tables, and having an SMP machine, consider setting intra_parallel to YES (1) or SYSTEM (-1) to take advantage of parallel performance improvements.
Multiple processors can be used to scan and sort data. The only time when it is not advantageous to have multiple processors during index creation occurs when the indexsort database configuration parameter is NO. (The default for the parameter is YES). The parameter controls whether sorting of index keys is done during index creation.
Index Table Space
Indexes may be stored in a different table space from that used to store other table data. This can allow for more efficient use of DASD devices by reducing the movement of read/write heads. You can also create your index table spaces so they will be stored on faster physical devices.
A table space may also be assigned a separate buffer pool which may protect the index pages from being pushed out of the buffer by the presence of lots of data pages.
When indexes are not placed in separate table spaces, both data and index pages use the same extent size and prefetch quantity. If you use a different table space for indexes, you have the option of selecting different values for all the characteristics of a table space. Since indexes are typically smaller than tables and are spread over fewer containers, it is common to find smaller extent sizes such as 8 and 16. For more information see, "Index Page Prefetch". Use of faster devices for a table space will be considered by the SQL optimizer, as described in "Table Space Impact on Query Optimization". For more information about table spaces, see "Designing and Choosing Table Spaces".
Degree of Clustering
If your SQL statement requires ordering (for example, ORDER BY, GROUP BY, DISTINCT) and there is an appropriate index to satisfy the ordering, there may be times that the database manager does not choose the index. This could happen when:
- Index clustering is poor (see the CLUSTERRATIO and CLUSTERFACTOR columns of SYSCAT.INDEXES)
- The table is small enough that it is cheaper to scan the table and sort the answer set in memory
- There are competing indexes for accessing the table.
It is recommended that you perform a REORG, or a sort and LOAD, after creating a clustering index. In general a table can only be clustered on one index. Your tables and indexes should be built in the sequence of the clustering index for that table. A clustering index attempts to maintain a particular order of data, improving the CLUSTERRATIO or CLUSTERFACTOR statistics collected by the RUNSTATS utility.
You should also consider using PCTFREE when altering a table before loading or reorganizing that table. In order for clustering to be maintained, each table needs to have space available on each data page for additional inserts. When the space is available, additional inserts are able to be clustered with the existing data. As a result, you will want to consider loading your data into the table after leaving a percentage of free space on each page for the clustering of additional data. You can do this by first creating the table, then altering the table with the PCTFREE parameter. In a similar way, before reorganizing your data, you should consider altering the table with the PCTFREE parameter. Otherwise, the reorganization will eliminate all extra space if PCTFREE has not been set.
Clustering is not currently maintained during updates. That is, if one updates a record such that its key value in the clustering index is changed, the record will not necessarily be moved to a new page to maintain the clustering order. To maintain clustering, instead of using UPDATE, use DELETE and then INSERT.
RUNSTATS Utility
After creating a new index, you should use the RUNSTATS utility to collect index statistics. These statistics allow the optimizer to determine whether using the index can improve access performance. See "Collecting Statistics Using the RUNSTATS Utility" for more information on this topic.
Reorganizing an Index
To get the best performance you can from your indexes, you should consider reorganizing your indexes periodically. Updates to your tables may cause index page prefetch to become less effective. To keep the effectiveness of index page prefetch you must reorganize the index.
You can reorganize the index by either dropping and re-creating the index, or by using the REORG utility. For more information, see "Reorganizing Table Data".
To prevent having to re-organize often, you can specify PCTFREE when creating an index. Specifying the PCTFREE parameter during index creation results in free space being left on each index leaf page as it is created. As a result, during future activity involving the index, records can be inserted into the index with less likelihood of causing index page splits. Index page splits cause index pages to not be contiguous nor sequential. This results in decreased ability to perform index page prefetching. Choosing an appropriate PCTFREE for an index may eliminate or reduce the frequency when you have to reorganize indexes.
Note: The PCTFREE specified when you create the index is used when the index is re-created during reorganization.

Dropping and re-creating the index gets a new set of pages that are roughly contiguous and sequential. This improves index page prefetch when it occurs.
Although more costly to accomplish, the REORG utility also ensures clustering of the data pages. This clustering has greater benefit for index scans accessing a significant number of data pages.
If you work in a symmetric multi-processor (SMP) system environment, the REORG utility will use multiple processors when intra_parallel is YES or ANY.
Use EXPLAIN
Periodically, run EXPLAIN on your most frequently used queries and check that each of your indexes is used at least once. If an index is not used in any query, consider dropping that index.
Also, use EXPLAIN to see if table scans on large tables are processed as the inner of nested loop joins. This would indicate that an index on the join predicate column is either missing or thought to be ineffective at applying the join predicate. Or, perhaps the join predicate is not present.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]