January 29, 2013

Table Sample Option Reduces RUNSTATS Overhead

The RUNSTATS utility gathers summary information about the characteristics of data in table spaces, indexes and partitions. I've previously written about using RUNSTATS (here and here), but now I can report that a new option makes the utility less resource-intensive.

It's no secret that RUNSTATS, in the course of conducting its valuable work, consumes a significant amount of CPU. However, with DB2 10, IBM has delivered a new option that allows database adminstrators to more closely control how RUNSTATS functions.

This summary information in RUNSTATS is used by the DB2 optimizer during the bind process to determine the best possible access path. Prior to DB2 10, the SAMPLE parameter was used to reduce the number of rows the CPU intensive cardinality calculation was performed on. However, 100 percent of these rows are still read by RUNSTATS. But now a new parameter, TABLESAMPLE, allows DBAs to determine the number of data pages that get sampled. Obviously, if only some pages are read, this can significantly reduce CPU usage.

So what percentage of pages need to be processed to ensure that the optimizer can determine the best access path? TABLESAMPLE includes a setting, SYSTEM AUTO, that allows DB2 to set the percentage anywhere from 100 down to 10 percent, based on table size. The larger the table, the smaller the sampling rate. For tables with fewer than 500,000 rows, all pages are sampled.

Alternatively, the percentage can be set manually. You can enter a value as low as 1 percent (0.01), though IBM doesn't recommend going below 10 percent. 

The TABLESAMPLE parameter only applies to single-table table spaces, and is not applicable to LOB table spaces. Indexes do not exploit this page and row level sampling. 

If you've used TABLESAMPLE, please leave a comment about your experiences.