Blog
DB2utor

Categories

May 06, 2008

How to Write Efficient SQL

Note: This blog entry is part one of a three-part series.

What's the most efficient SQL statement you could ever write?

The answer is one that's never executed.

Of course, this answer is unrealistic. Fortunately, plenty of real-world options exist for writing efficient SQL that is executed. Over the next three weeks we'll explore this topic in depth. Here, I cover the different predicate types.

SQL's strength is also its weakness. Because it's flexible, there are multiple ways to, say, write a statement to return a set of data. However, there's generally only one way to write a statement that most efficiently collects and returns that data.

This query has four predicates: an equal (=) predicate on C1, a BETWEEN predicate on C2, a LIKE predicate on C3, and a subquery predicate on C4.

SELECT C5 FROM T1
WHERE C1 = 1234
AND C2 BETWEEN 10 AND 90
AND C3 LIKE 'CHI%'
AND C4 EXISTS (SELECT 1 FROM T2
WHERE T2.C3 = T1.C3)

When we talk about tuning SQL, we're really talking about ways to improve the access path. The access path is the way DB2 processes and returns rows of data to your application. Predicates influence the access path when used in a statement's WHERE or ON clause. Predicates in a HAVING clause don't affect the access path selection process.

Predicate Classifications

As we discuss ways of writing efficient SQL, we're talking about the predicates used in the WHERE or ON clause. As we describe predicates, we also use terms to classify it based on how the predicate is used in the SQL statement. Descriptions of some of the common classification terms follow:

  • Simple/compound: A compound predicate is the result of two predicates (which individually may be simple or compound) that are connected by AND or OR Boolean operators. All other predicates are simple.
  • Local/join: A local predicate references only one table. It's local to the table and restricts the number of rows returned for that table. A join predicate involves more than one table to determine the way rows are joined from two or more tables.
  • Boolean term: Any predicate that isn't contained in a compound OR predicate is a boolean predicate. If a boolean term is evaluated false for a given row, then the entire WHERE clause is evaluated false for that row.
  • Indexable: A predicate is indexable when it can match index entries; otherwise, it's non-indexable. Usually using a “NOT” or “<>” operator causes the predicate to be non-indexable.
  • Stage one or stage two: Rows retrieved for a query go through two stages of processing. Some predicates can be applied during the first stage while others must be applied at the second stage. Stage one qualifies rows at the page level while in stage two, each row is inspected to determine if it qualifies. Inspecting each and every row obviously requires more processing time.

Rather than list all of the stage one predicates, I recommend printing the predicate type and processing chart found in the DB2 9 for z/OS Performance Monitoring and Tuning Guide, pages 253-258. Keep this chart nearby for quick reference while you learn to code efficient SQL.

Predicate Types

Before getting into the order of predicate evaluation, let's first cover the different predicate types within a DB2 for z/OS SQL statement. The type of predicate is determined by the operator or syntax. Here's a list of predicate types:

  • Subquery: Any predicate that includes another SELECT statement.

Example: WHERE C1 IN (SELECT C2 FROM TABLE1)

  • Equal: Any predicate that isn't a subquery predicate. It has an equal operator (=) but no NOT operator. Two other forms of equal predicates are “column IS NULL” (to support NULL data) and the “column IS NOT DISTINCT FROM” operator.

Example: WHERE C1 = 1234

  • RANGE: Any predicate that isn't a subquery predicate and has one of these operators: >, >=, <, <=, LIKE, BETWEEN.

Example: WHERE C1 <= 1234

  • In-list: A predicate using “IN” as the operator.

Example: WHERE C1 IN (1231, 1232, 1233, 1234, 1235)

  • Not: Any predicate that isn't a subquery predicate and contains a NOT operator. This also includes “C1 IS DISTINCT FROM.”

Example: WHERE C1 NOT BETWEEN 1231 AND 1235

Tips

It's also important to understand that the way you code your SQL statement could negatively impact how that statement performs. Here are some good rules to follow to ensure the best performance.

  • Ensure that the statements you think should be indexable are in fact indexable. That is, you have an index on the column and you want to use that index. We'll review what and what isn't indexable next week.
  • Ensure that the data type and length of your host variables are the same type and length as the column. If the declared length in the host variable is greater than the length on the table, then a non-equal comparison would be stage one, but wouldn't be a matching predicate for an index scan.
  • Use stage one predicates as much as possible.
  • Apply local predicates to reduce the number of rows that need to be processed before a join predicate is used.
  • Try to rewrite the SQL to avoid NOT predicates, since these aren't stage one indexable.
  • Remove redundant predicates. They slow down the optimizer and aren't needed.

Next week we'll look at when DB2 evaluates a predicate, what order of precedence a predicate has and how this impacts the efficiency of filtering rows. We'll wrap up the discussion in part three with examples of efficiently written SQL.