July 15, 2008

Batch Tuning: Reduce I/O

This is the third in a five-part series on tuning batch jobs. In part two I covered reducing CPU usage. Here I look at ways to reduce I/O, which in turn reduces CPU usage and elapse time.

Physical Database Design

Early in my career I had the pleasure of working on some large complex systems that required logical design work before actually physically building the database. I'd always insist that the tables be normalized, because I knew that denormalizing would pose future problems when it came time to redesign the database. However, I've since learned that denormalization has its place, particularly when performance is a priority.

Subtype Tables

One example is when you need to subtype a table. For example, I have a table (CUSTOMER) that contains account information for a local telecom's residential and corporate customers. The residential customers make up about 20 percent of the table data. A large number of columns pertain to residential or corporate customers and would waste lots of space if held on a single customer table. So I created subtype tables with specific columns for each of the different customer types. The logical way of processing residential customers would be to join the customer table with the residential table and provide the needed information. However, reading both corporate and residential account information takes considerable I/O on the customer table. But I/O can be reduced by creating a new column on the customer table with subtype R or C. Now generate the residential report filtering on the customer table where the subtype is R (See Figure: Download db2utor_071508_fig_1.bmp). 

Other design opportunities--many more than I have room to list here--may require denormalization for performance reasons. Just remember that systems are usually easier to change and maintain if they're kept as close as possible to the normalized logical model.


Prefetch is the process of reading a set of pages, usually 32, into the buffer pool with only one asynchronous I/O operation. Using prefetch can save substantial amounts of CPU and I/O. The key to taking advantage of prefetch with any batch application is to sort the input file in the clustering order of the tables that are being processed. To learn more about prefetch, see the DB2 Version 9.1 for z/OS Performance Monitoring and Tuning Guide.

In Memory Code Tables

Sometimes the overhead of joining in a code table to pull in a description for millions of rows of data can greatly increase that query's CPU and I/O. An alternative is to read a small code table into an application array and pull the description from memory. If the code table is larger and you typically only use a small percentage of the data with any given execution, you can do a lookup in the array. If the description isn't found in the array, then get the information from DB2 and store it in the array. This way you only read from DB2 the data needed during this execution. While this adds some coding to the program, the reduction in I/O should more than make up for it.

Next in this series: Checkpoint Restart.