IBM Books

Administration Guide


Operation Merging

The SQL compiler will rewrite queries to merge query operations, in an attempt to construct the query so that it has the fewest number of operations, especially SELECT operations. The following examples are provided to illustrate some of the operations that can be merged by the SQL compiler:

Example - View Merges

Suppose you have access to the following two views of the EMPLOYEE table, one showing employees with a high level of education and the other view showing employees earning more than $35,000:

   CREATE VIEW EMP_EDUCATION (EMPNO, FIRSTNME, LASTNAME, EDLEVEL) AS
   SELECT EMPNO, FIRSTNME, LASTNAME, EDLEVEL
     FROM EMPLOYEE
    WHERE EDLEVEL > 17
   CREATE VIEW EMP_SALARIES (EMPNO, FIRSTNAME, LASTNAME, SALARY) AS
   SELECT EMPNO, FIRSTNME, LASTNAME, SALARY
     FROM EMPLOYEE
    WHERE SALARY > 35000

Now suppose you perform the following query to list the employees who have a high education level and who are earning more than $35,000:

   SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY
     FROM EMP_EDUCATION E1,
          EMP_SALARIES  E2
    WHERE E1.EMPNO = E2.EMPNO

During query rewrite, these two views could be merged to create the following query:

   SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY
     FROM EMPLOYEE E1,
          EMPLOYEE E2
    WHERE E1.EMPNO = E2.EMPNO
      AND E1.EDLEVEL > 17
      AND E2.SALARY  > 35000

By merging the SELECT statements from the two views with the user-written SELECT statement, the optimizer can consider more choices when selecting an access plan. In addition, if the two views that have been merged use the same base table, additional rewriting may be performed as described in "Example - Redundant Join Elimination".

Example - Subquery to Join Transformations

The SQL compiler will take a query containing a subquery, such as:

   SELECT EMPNO, FIRSTNME, LASTNAME, PHONENO
   FROM EMPLOYEE
   WHERE WORKDEPT IN
         (SELECT DEPTNO
            FROM DEPARTMENT
           WHERE DEPTNAME = 'OPERATIONS')

and convert it to a join query of the form:

   SELECT DISTINCT EMPNO, FIRSTNME, LASTNAME, PHONENO
     FROM EMPLOYEE EMP,
          DEPARTMENT DEPT
    WHERE EMP.WORKDEPT = DEPT.DEPTNO
      AND DEPT.DEPTNAME = 'OPERATIONS'

A join is generally much more efficient to execute than a subquery.

Example - Redundant Join Elimination

Queries can sometimes be written or generated which have unnecessary joins. Queries such as the following could also be produced during the query rewrite stage as described in "Example - View Merges".

   SELECT E1.EMPNO, E1.FIRSTNME, E1.LASTNAME, E1.EDLEVEL, E2.SALARY
     FROM EMPLOYEE E1,
          EMPLOYEE E2
    WHERE E1.EMPNO = E2.EMPNO
      AND E1.EDLEVEL > 17
      AND E2.SALARY  > 35000

In this query, the SQL compiler can eliminate the join and simplify the query to:

   SELECT EMPNO, FIRSTNME, LASTNAME, EDLEVEL, SALARY
     FROM EMPLOYEE
    WHERE EDLEVEL > 17
      AND SALARY  > 35000

Example - Shared Aggregation

Using multiple functions within a query can generate several calculations which take time. Reducing the number of calculations to be done within the query results in an improved plan. The SQL compiler takes a query using multiple functions such as:

   SELECT SUM(SALARY+BONUS+COMM) AS OSUM,
     AVG(SALARY+BONUS+COMM) AS OAVG,
     COUNT(*) AS OCOUNT
   FROM EMPLOYEE;

and transforms the query in the following way:

   SELECT OSUM,
     OSUM/OCOUNT
     OCOUNT
   FROM (SELECT SUM(SALARY+BONUS+COMM) AS OSUM,
     COUNT(*) AS OCOUNT
   FROM EMPLOYEE) AS SHARED_AGG;

This rewrite reduces the query from 2 sums and 2 counts to 1 sum and 1 count.

Example - Summary Tables

Following is an example of a multidimensional analysis that could take advantage of summary tables. A summary table is created with the sum and count of sales for each level of:

A wide range of queries can pick up their answers from this stored aggregate data. The following example calculates the sum of product group sales, by state, by month. Queries that can take advantage of such pre-computed sums would include:

While the precise answer is not included in the summary table for any of these queries, the cost of computing the answer using the summary table could be significantly less than using a large base table, because a portion of the answer is already computed. For example:

  CREATE TABLE PG_SALESSUM
    AS (
        SELECT l.id AS prodline, pg.id AS pgroup,
               loc.country, loc.state
               YEAR(pdate) AS year, MONTH(pdate) AS month,
               SUM(ti.amount) AS amount,
               COUNT(*) AS count
        FROM   cube.transitem AS ti, cube.trans AS t,
               cube.loc AS loc, cube.pgroup AS pg,
               cube.prodline AS l
        WHERE  ti.transid = t.id
           AND ti.pgid = pg.id
           AND pg.lineid = l.id
           AND t.locid = loc.id
           AND YEAR(pdate) > 1990
        GROUP BY l.id, pg.id, loc.country, loc.state,
                 year(pdate), month(pdate)
       )
  DATA INITIALLY DEFERRED REFRESH DEFERRED;
 
  REFRESH TABLE SALESCUBE;

The following are sample queries that would obtain significant performance improvements because they are able to use the results in the summary table that are already computed. The first example returns the total sales for 1995 and 1996:

  SET CURRENT REFRESH AGE=ANY
 
  SELECT YEAR(pdate) AS year, SUM(ti.amount) AS amount
  FROM   cube.transitem AS ti, cube.trans AS t,
         cube.loc AS loc, cube.pgroup AS pg,
         cube.prodline AS l
  WHERE  ti.transid = t.id
     AND ti.pgid = pg.id
     AND pg.lineid = l.id
     AND t.locid = loc.id
     AND YEAR(pdate) IN (1995, 1996)
  GROUP BY year(pdate);

The second example returns the total sales by product group for 1995 and 1996:

  SET CURRENT REFRESH AGE=ANY
 
  SELECT pg.id AS "PRODUCT GROUP",
         SUM(ti.amount) AS amount
  FROM   cube.transitem AS ti, cube.trans AS t,
         cube.loc AS loc, cube.pgroup AS pg,
         cube.prodline AS l
  WHERE  ti.transid = t.id
     AND ti.pgid = pg.id
     AND pg.lineid = l.id
     AND t.locid = loc.id
     AND YEAR(pdate) IN (1995, 1996)
  GROUP BY pg.id;


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]