May 20, 2014

Why is DB2 Rewriting my SQL?

The EXPLAIN statement provides a view into what DB2 determines is the optimum access path for accessing the data. The valid statements that can be processed by the EXPLAIN statement are SELECT, MERGE and INSERT, as well as search forms of UPDATE and DELETE. The information obtained from the EXPLAIN statement is placed in a set of user tables known as the EXPLAIN tables. The DB2 for z/OS Managing Performance guide has details about using EXPLAIN as well as lists of tables.

With DB2 11, column changes have been made to some existing tables. New tables have also been added to the family of EXPLAIN tables. For a complete list of changes, see Chapter 12 ("EXPLAIN Table Changes in Version 11) in the What's New guide.

As IBM brings new features such as temporal and archive tables to the database, it becomes more important to your tuning efforts to know when DB2 is modifying your SQL. These modifications to the EXPLAIN tables in DB2 11 are designed help you identify these changes:

* The PLAN_TABLE includes a new column, EXPANSION_REASON, that identifies the particular global variable or special register that's used to implicitly modify the query to support temporal or archive table processing.

* The DSN_PREDICATE_TABLE is an existing table with a change to the description. The description is: The predicate table, DSN_PREDICAT_TABLE, contains information about all of the predicates in a query. It is also used as input when you issue a BIND QUERY command to override predicate selectivities for matching SQL statements.

When DB2 rewrites the SQL, the ADDED_PRED values identify the actions that DB2 performed.

A list of ADDED_PRED values follows. I've added my own notes in parentheses.

blank -- DB2 didn't add the predicate.

'B' -- For bubble up. (Note: DB2 copies predicates from sub-select or table-expression  into the outer query to help filter rows more quickly.)

'C' -- For correlation. (Note: When a sub-select is written without a correlation, the optimizer determines that a correlated sub-select will perform better than a non-correlated query.)

'J' -- For join. (Note: When the optimizer detects that related tables are being processed without providing join criteria, join criteria is added.)

'K' -- For LIKE for expression-based index. (Note: The optimizer detected an expression index and modified the SQL to take advantage of this index.)

'L' -- For localization.

'P' -- For push down. (Note: The optimizer detected a predicate from an outer query that can be used in a sub-select or table expression.)

'R' -- For page range.

'S' -- For simplification.

'T' -- For transitive closure.

Hopefully IBM will soon publish additional documentation on these values and how they're being used to improve performance.