Blog
DB2utor

Categories

May 20, 2008

How to Write Efficient SQL (Part 3)

This is the third in a three-part series on writing efficient SQL. Initially we reviewed concepts and terms that are used to explain how DB2 filters and retrieves rows of data based on different types of predicates. You also received some general rules for coding efficient SQL.

Last week we reviewed the order of evaluation of the predicate and how filter factors are used by DB2 to estimate the number of rows that qualify with the given predicate. Putting predicates that have a high filter factor first reduces the number of rows that must be processed by the next predicate in the query.

Here, we wrap things up by reviewing some queries and applying what we've learned about writing efficient SQL.

The general goal is to write SQL with stage-one indexable predicates whenever possible.  If you see a stage-two predicate, try to rewrite it as a stage one. I mentioned last week that installing Visual Explain for DB2 V8 or Optimization Service Center (OSC) for DB2 V9 can help you write efficient SQL. These tools show you the order of process taken by DB2 and whether or not the predicate is stage-one. They also display the filter factor of the columns and deterimine which predicates should go first and if it makes sense to put an index on a column that has high filter factor.

Non-Column Expressions

Now let's cover the use of predicates written with expressions on non-columns. Non-columns are usually host variables or special registers in DB2 that are used in an expression without referencing a column on a table. When the predicate is written with a column followed by an operator followed by the non-column expression (COL op noncol), then DB2 can process this as a stage-one indexable predicate. When you add a calculation to the noncol, such as (COL op (noncol + 1)), DB2 can still process this at stage one, but cannot use an index. However, moving the noncol expression to the same side of the operator as the column (COL noncol > value) makes this a stage-two predicate that's not as efficient to process.

Consider this example using the DSN8910.EMP table with 500,000 rows:

Stage-two processing: Non-Column on same side as column

SELECT E.FIRSTNME, E.LASTNAME, E.WORKDEPT          
FROM DSN8910.EMP E                                 
WHERE E.SALARY + (:WS-PCT-INC * E.SALARY) > 159999.00

The performance data below shows that this statement has a cost-factor of 22006 and took on average 1.682 seconds to complete the query and return 5 rows out of 500,000.

                    Rows           Duration in seconds                                      Cost-
Type            Returned     RUN-time     OPEN-time    FETCH-time          Factor
-----------      -------------   ---.---------     -.---------         --.---------        --------
SELECT                     5     1.693930     0.000027          1.691144        22006
SELECT                     5     1.670273     0.000024          1.670138        22006

Looking at a Visual Explain report on this statement, notice that under Query Summary the CPU cost is 1492 and under Predicate Summary the "Is Sargable” is “N" -- this is a stage-two predicate with a filter factor of 0.33, meaning 1/3 of the 500000 rows will qualify.

Download coleman_figure1.pdf

Download coleman_figure2.pdf

Stage-one processing: Non-Column on opposite side of column

SELECT E.FIRSTNME, E.LASTNAME, E.WORKDEPT          
FROM DSN8910.EMP E                                 
WHERE E.SALARY > 60000.00 / (1 + :WS-PCT-INC)

                    Rows Of       Duration in seconds                                      Cost-
Type            SQLCODE  RUN-time     OPEN-time    FETCH-time             Factor
-----------      -------------   ---.---------     -.---------         --.---------        --------
SELECT                     5      0.003675    0.000048           0.000623       3171
SELECT                     5      0.000857    0.000043           0.000700       3171

Notice that the cost-factor dropped from 22006 to 3171. This resulted in a 99.9-percent reducction in average elapsed time (from 1.682 to 0.002 seconds).

The Visual Explain report for these statements shows the CPU Cost (ms) was reduced by 99 percent, from 1492 to 18.

Download coleman_figure3.pdf

Download coleman_figure4.pdf

Join Tables

A common problem when joining tables is forgetting to include all of the leading columns of the index that will be used to join the tables. Here, I’m using the IBM catalog tables SYSIBM.SYSPACKAGE and SYSIBM.SYSPACKSTMT to list statements in packages bound to the collection SBA310_SBASTART_C. In this example I forgot to include the first column of the index "location." The CPU cost isn't bad (47ms), but once I include the "location" column, the CPU cost drops to 2ms.

SQL Statement :

SELECT pk.collid, pk.name
FROM sysibm.syspackage pk
           , sysibm.syspackstmt ps
where pk.collid = ps.collid
    and pk.name = ps.name
    and pk.contoken = ps.contoken
    and pk.collid = 'SBA310_SBASTART_C'

Notice in the Visual Explain Access Plan report (figure 1) that we have matching columns of 0 and a predicate type of "Screening Predicate." Avoid this predicate type when joining two tables; "Match Predicate" is the type you want.

View this photo 

Once I notice the match columns is 0, I realize I need to add the location. Now I know with DB2 the location column is always blank, so I just add where pk.location = ' '.

SQL Statement :

SELECT pk.collid, pk.name FROM sysibm.syspackage pk , sysibm.syspackstmt ps where pk.location = ps.location and pk.collid = ps.collid and pk.name = ps.name and pk.contoken = ps.contoken and pk.location = ' ' and pk.collid = 'SBA310_SBASTART_C'

A couple of things to notice in this explain report: We now have a matching predicate filter. We went from match columns of 0 to 2, and the join changed from "Merge Scan (MS Join)" to "Nested Loop (NL Join)." MS Join requires sorts. They're identifiable in the above figure by the red icons.

View this photo

There are many more examples, but I don't want to overwhelm you. I simply want to introduce the basic options and tools that can help you rewrite poorly written SQL statements so that they're more efficient. For a more detailed discussion of what I've covered here, please read "The DB2 V9 for z/OS Performance Monitoring and Tuning Guide."