Blog
DB2utor

Categories

July 08, 2008

Batch Tuning: Reduce CPU Usage

This is the second part of what will be a five-part series on tuning batch jobs. In last week's DB2utor blog entry, I examined reducing elapse time. Here I'll share the available options for reducing CPU usage within your application code. There are several.

Many performance problems can be avoided if your shop has good coding standards (along with good SQL standards). One of the first coding standards you should follow is to avoid filtering data within your program. Let DB2 do the filtering for you.

Set Processing, Not Record Processing
Over the years I’ve seen applications designed to process data a record at a time. This is common in shops that traditionally process using flat or VSAM files. When designing a program that performs record processing, the record is read first, then some of the fields are evaluated to see if this record needs to be processed. If it doesn't, then the record is ignored and the next one is read.

This CPU-intensive design is most commonly found in VSAM applications that have been ported to DB2 without being redesigned. Each column and each row of data returned to your program cost CPU cycles. Having the data brought back and then not used wastes CPU cycles.

Here's an example. The SQL that follows reads in all columns from TBL1 for a given key. Then the program examines the active status column to determine if further processing is needed. If not, it gets the next row.

Incorrect: Filter Rows in Program

EXEC SQL
  SELECT * FROM TBL1
  WHERE key_col = :ws-key-col
END-EXEC

IF tbl1.status = ws-active THEN
   process active
   . . .
ELSE
   get next row
END-IF.

Notice in the next example that the filter on ws-active is moved from program to SQL logic. DB2 returns the data only when the status is active.

Correct: Filter Rows in DB2

EXEC SQL
  SELECT * FROM TBL1
  WHERE key_col = :ws-key-col
     and status = :ws-active
END-EXEC.

process data
. . .
get next row

Coding rule: If you see record-level processing, redesign the program to use set processing.

Return Only Needed Columns
Another common mistake is to select every column on a table even when your program doesn't use the column. Since you may decide to use the column later, you may as well select it now. Sometimes you already know the value you're selecting because you have an equal predicate with a host variable for that column.

In this example, I have an index on col1, col2, col3.

EXEC SQL
  SELECT col1, col2, col3, col4
  FROM TBL1
  WHERE col1 = :ws-key-col1
     AND col4 = :ws-active
END-EXEC.

process data
. . .
get next row

If you're not paying attention, this looks like a good statement. I’m using matching on all the columns of the index. But the problem here is that I don’t need to select col1 and col4 because I have an equal predicate on these columns with a host variable. Another issue with selecting col4 is that I’m now forcing access to the data, causing two I/Os and increasing CPU. By removing col4 from the select, I avoid the extra I/O and reduce the CPU usage.

Correct Version:

EXEC SQL
  SELECT col2, col3
  FROM TBL1
  WHERE col1 = :ws-key-col1
END-EXEC.

process data
. . .
get next row

Coding rule: Select only the columns needed. If you see an equal predicate with a host variable, don't include that column in the select statement.

Avoid Using Cursor for Row Exist Check
Sometimes you need to verify a row exists in a table. I always recommended performing a SELECT statement on the table -- if you get an SQLCODE +0 or -811, then you know the row exists. The -811 just says you returned more than one row, so you should use a cursor.

Some companies have a standard that says this test should always be done with a cursor.  So you declare the cursor, open, fetch and then close the cursor. But compared to processing a single select, this requires a lot more resources.

DB2 V8 provides a new way of row-checking, FETCH FIRST 1 ROW ONLY:

SELECT COL1, COL2
FROM TBL1
WHERE condition
FETCH FIRST 1 ROW ONLY

Coding rule: For row existence checks, use FETCH FIRST 1 ROW ONLY instead of declaring a cursor or performing a single SELECT.

Use STAGE 1 predicates
DB2 processes the predicates in your SQL statement as either stage 1 or stage 2. Stage 1 predicates qualify a row at the page level, where stage 2 process a row at a time to qualify the data. It's a good idea to print off the predicate type and processing chart as you move from release to release of DB2. With Optimizer improvements, what used to be a stage 2 or non-indexable predicate is now indexable and stage 1. If you're working with pre-DB2 V9 versions, check the DB2 Administration Guide for a chart. Otherwise, this information can be found in the DB2 9 Performance Monitoring and Tuning Guide.

A sample of the chart:

Predicate Type                  Index         Stage 1
===========                 ====          =====
COL = value                     Y                Y
COL = nocol expr             Y                Y
COL IS NULL                  Y                Y
. . .
. . .
T1.COL1 = T1.COL2       N                N
COL = (cor subq)              N                N

Some SQL statements that aren't using a stage 1 predicates may be rewritten to use stage 1. Say I need to compare two columns in the same table on the same row. As you can see from the list above, T1.COL1 = T1.COL2 is a stage 2 predicate. But you could join T1 to itself where the keys match, and then compare the columns. This would then be a stage 1 indexable predicate.

Example:

SELECT A.col2, A.col3
FROM TAB1 A
           , TAB1  B
WHERE A.key1 = B.key1
      AND A.col1 = B.col2

Coding rule: Rewrite SQL to use stage 1 predicates when possible.

Use STATIC SQL and REOPT (ALWAYS) when data is skewed
Many DBAs require programs running in production to be written using static SQL. This is due to the overhead associated with dynamic SQL, which authenticates and builds an access path each time the statement is executed. However, if your data is skewed, dynamic SQL may work better because DB2 will evaluate the literal and recognize how many times it occurs versus a host variable where DB2 doesn't know the value. As of DB2 V7, you can use REOPT(ALWAYS)--formerly REOPT(VARS)--to have the Optimizer evaluate the value in the host variable and select an access path. This is better than using a dynamic statement because you avoid authorization checking (which reduces CPU by about 10 percent). This is a package-level parameter, so if you have multiple SQL statements and only need this feature on one or two of them, you may want to move those statements to a separate program to avoid the overhead of re-optimizing statements that don't need re-optimizing.

Sort Input Data in Cluster Order
Batch jobs usually read in a record, select information from DB2, then read another record, and so on. DB2 V9 has greatly reduced get pages, which reduces CPU. If you sort the input data in the order of the clustering key, then DB2 can avoid a lot of get pages by keeping track of the leaf pages accessed in the previous call and seeing if the current key is in the range of the high and low key on the current leaf page. To learn more about index look-aside, see the IBM Redbook, "DB2 9 for z/OS Performance Topics."

Next in this series: Reducing I/O. I’m always looking to learn from others' experiences. Please post your ideas in comments. Thank you.