A SELECT statement defines a set of rows which satisfy the search criteria. The DB2 optimizer assumes the application will retrieve all the qualifying rows. This assumption is most appropriate in OLTP and batch environments. However, in "browse" applications it is common for a query to define a very large potential answer set but only retrieve the first few rows, typically only as many rows as are required to fill the screen.
The default assumption made by the optimizer may not be the best for these browse applications. The OPTIMIZE FOR clause provides a mechanism for an application to declare its intent to retrieve only a subset of the result or to give priority to the retrieval of the first few rows. Once this intent is understood, the optimizer can give preference to access plans that minimize the response time for retrieving the first few rows. Also, the number of rows that are sent to the client as a single block (see "Row Blocking") are bounded by the value of "n" in the OPTIMIZE FOR clause. Therefore, the OPTIMIZE FOR clause affects both how the qualifying rows are retrieved from the database by the server, and how the qualifying rows are returned to the client.
For example, suppose you are querying the employee table for the employees with the highest salary on a regular basis.
SELECT LASTNAME,FIRSTNAME,EMPNO,SALARY FROM EMPLOYEE ORDER BY SALARY DESC
You have defined a descending index on the SALARY column. However, since employees are ordered by employee number, the salary index is likely to be very poorly clustered. The optimizer, in trying to avoid many random synchronous I/Os, would likely choose to use the list prefetch access method (see "Understanding List Prefetching") which requires the row identifiers of all rows that qualify to be sorted. This can cause a delay before the first qualifying rows can be returned to the application. By adding the OPTIMIZE FOR clause to the statement as follows:
SELECT LASTNAME,FIRSTNAME,EMPNO,SALARY FROM EMPLOYEE ORDER BY SALARY DESC OPTIMIZE FOR 20 ROWS
the optimizer would likely choose to use the SALARY index directly with the knowledge that in all likelihood only the twenty employees with the highest salaries would be retrieved. Regardless of how many rows could be blocked, a block of rows is returned to the client every twenty rows.
Use of the OPTIMIZE FOR clause causes the optimizer to favor access plans that avoid bulk operations or operations that interrupt the flow of rows, such as sorts. You are most likely to influence an access path by using OPTIMIZE FOR 1 ROW. As a result, using this clause could have the following effects:
Although the OPTIMIZE FOR clause applies to all optimization classes (see "Adjusting the Optimization Class"), it works best for optimization class 3 and higher. The use of the greedy join enumeration method (see "Search Strategies for Selecting Optimal Join") in optimization classes below 3 sometimes results in access plans for multi-table joins that do not lend themselves to quickly retrieving the first few rows.
The OPTIMIZE FOR clause does not prevent you from retrieving all the qualifying rows. However the total elapsed time to retrieve all the qualifying rows may be significantly greater than if the optimizer had been allowed to optimize for the entire answer set.
If you have a packaged application that uses the call level interface (DB2 CLI or ODBC) it is possible to have DB2 CLI automatically append an OPTIMIZE FOR clause to the end of each query statement using the OPTIMIZEFORNROWS keyword in the db2cli.ini configuration file. For additional information refer to the CLI Guide and Reference manual.
When retrieving rows using a SELECT statement, you might also want to consider using the FETCH FIRST clause. This clause sets the maximum number of rows that can be retrieved from within a SELECT statement. Limiting the result table to the first several rows can improve performance. The database manager ceases processing the query once the specified number of rows have been retrieved.
If both the FETCH FIRST clause and the OPTIMIZE FOR clause are specified, the lower of the two values is used to influence the communications buffer size. The two values are considered independent of each other for optimization purposes. See "Using a select-statement" for more information on the interaction between these two clauses.