September 20, 2011

DB2 10 and Prefetch

DB2 10 makes it easier for customers to take advantage of prefetch. But before going into that, let me step back and define prefetch and explain its different types.

Prefetch is a mechanism DB2 uses to read a set of pages, usually 128KB in size, into the buffer pool in a single asynchronous I/O operation. By avoiding costly synchronous read I/O operations, prefetch substantially reduces CPU and I/O usage. Basically DB2 is reading in more data while your application is running and processing data in the bufferpool (as opposed to waiting around for data).

To see if the DB2 optimizer has chosen a prefetch access method, enter the EXPLAIN command and look at the rows of data in your PLAN_TABLE. The PREFETCH column contains one of these values: D, S, L or U.

Here's a quick rundown of the prefetch types:

  • DYNAMIC prefetch--With dynamic prefetch, DB2 detects whether pages are being read sequentially. If so, DB2 activates sequential prefetch. The PREFETCH column in PLAN_TABLE displays a value of "D." However, if the statistics aren't up to date, the PREFETCH column value may be "S" (though DB2 still detects whether the data is truly being processing sequentially).
  • SEQUENTIAL prefetch--Sometimes the only way to read data is by doing a table space scan. In these situations, DB2 uses sequential prefetch. When DB2 uses sequential prefetch, you'll typically see a value of "S" in the PREFETCH column.
  • LIST prefetch--List prefetch is special in that it reads data pages based on row IDs (RIDs) taken from an index. The maximum number of pages that can be read in a single list prefetch operation is 32 (64 for utilities). When DB2 uses list prefetch, the PREFETCH column value is "L" or "U."

To process a list prefetch operation, first retrieve the RID through a matching index scan on one or more indexes. Then sort the list of RIDs in ascending order by page number. Finally, prefetch the pages in order of the page number sort by the RIDs. Because the rows aren't returned in any specific order, the ORDER BY clause must be used if the data must be in a given order. The PREFETCH column value is "L." Now if DB2 determines it doesn't need to sort the RID list, the PREFETCH column value is "U." This usually occurs when a hybrid join is used with a highly clustered index.

Prior to DB2 10, DBAs often needed to concern themselves with the number of RIDs to sort. Meanwhile, DB2 systems programmers had to define a RID pool large enough to handle all the RIDs. (This is set with the DSNZPARM MAXRBLK.) Pre-DB2 10, when DB2 determines the RID pool issn't large enough, it drops all of the processing it just did and reverts doing to table space scans. As you can imagine, this is very expensive. In response, some shops deactivate list prefetch by setting DSNZPARM MAXRBLK to 0.

In DB2 10 however, list prefetch doesn't dump all the work it's done once the RID pool is full. DB2 continues processing by writing the RID list to a work file. With this enhancement, shops that have shut off DSNZPARM MAXRBLK to save money can now re-evaluate and take advantage of list prefetch.

Has your shop stopped using prefetch? Have you resumed using it since moving to DB2 10? Please share your experiences in Comments.