March 22, 2011

Recommended Runstats Parameters

SQL is a DB2 strongpoint, but it's also central to the challenge of reducing the total cost of ownership of running DB2 on z/OS. Because it is so rich and powerful, DB2 needs to analyze the SQL -- along with statistics gathered through the runstats utility -- to determine the lowest cost access path to retrieve the data.

This analysis is handled by the DB2 cost-based optimizer. When an SQL statement is executed, the DB2 optimizer prepares the statement to determine the optimal access path. This is industry-leading technology, and as part of the DBMS engine, the optimizer is enhanced with each new DB2 release. Through its evolution, the optimizer increasingly relies on getting current statistics about the data. Knowing the cardinality and frequency of data values for a given column allows the optimizer to function more effectively, especially if the data values are skewed. DB2 collects these statistics through the runstats utility.

The runstats utility can collect statistics on table spaces, partitions or index spaces. If you haven't looked at your runstats jobs for a few years, you're probably collecting stats by row, which is the least costly collection option. However, with recent enhancements, the optimizer is much effective at analyzing stats collected in columns or groups of columns. These capabilities can really help you, but because collecting data at the column level requires greater CPU usage, it's critical that you gather only the stats you need.

To get started, a DBA will typically setup runstats jobs to run on the table space or partition on a monthly basis. The next step is to setup weekly runstats on the index space. The reasoning here is that indexes are smaller and can provide the most bang for the buck. If you're running DB2 10, you'll want to run runstats using the table parameter to take advantage of the new SET PROFILE and OPTION PREVIEW features. See this recent post for details.

To learn more about the runstats utility and all of the parameters (and tables) that are being updated, see the DB2 10 Utility Guide and Reference.

The next level of statistics gathering is the column level. Again, this can be pricey, so you should take care when choosing columns to collect statistics on. Specifically, you should target  columns that use SQL WHERE, ORDER BY, GROUP BY and HAVING clauses as well as columns in an index.

Given the following SQL:
SELECT name, address
ORDER BY city, state

The runstats to support this SQL would be:

         COLUMN (CITY,STATE)            
         INDEX (ALL)                    
         SET PROFILE                 

If you've taken advantage of column-level statistics, I'd like to hear from you. Tell me what kinds of problems you've seen and how you fixed them through collecting runstats at the column level.