June 17, 2008

Recursive SQL: Solving Real Business Problems

Though I hadn't previously taken the time to study recursive SQL, I recently had the great pleasure of learning about it from Suresh Sane, the top speaker at this year's IDUG North America conference. (Premier IDUG members can download his presentation.)

While the SQL Reference guide tells you what the syntax is for a common table expression (CTE), which is a recursive SQL statement, you won't find an example using the CTE format. For examples and details explaining recursive SQL, you'll need the Application Programming and SQL Guide.

I read the rules in the guide, but I found Suresh's explanation much easier to follow. To summarize, the recursive query has four parts: 1) CTE, 2) prime the pump, 3) pump more, and 4) use the result.

Here's a sample recursive query to generate sequence numbers from 1 to 100:

2)        SELECT 1, 1
3)             UNION ALL
             SELECT LEVEL +1, LEVEL +1
             FROM NUMBERS
             WHERE LEVEL < 100

Section 1 is the CTE naming the expression, NUMBERS, with two columns (LEVEL and NEXTONE).

Section 2 (prime the pump) sets the initial values of LEVEL and NEXTONE to 1. LEVEL is a variable used in section 3 to stop the query from running forever.

Section 3 (PUMP) traverses recursively through the CTE, applying calculations to the data initially set in section 2 until a stop value is met. In this case the stop criteria is WHERE LEVEL < 100. If you don't provide a way for DB2 to stop processing a recursive query, you'll always end up in an infinite loop, and your query will consume large amounts of CPU. And that, I'm sure, will have your phone ringing or someone racing to your desk to try and to figure out what you're doing.

Section 4 (use the result) processes the results from the CTE in section 1.

You can use recursive SQL to solve many types of business problems. Suresh's examples include: org chart, generating test data, missing data, roll-up, even allocation, weighted allocation, RI children, cheapest fare and account linking.

Let’s look a query Suresh created to generate test data:

  (  EMPID             SMALLINT     NOT NULL,
     FNAME           CHAR(20)        NOT NULL WITH DEFAULT,
     LNAME           CHAR(20)        NOT NULL WITH DEFAULT,
     SALARY         DEC(7,2)          NOT NULL WITH DEFAULT,

This EMP table is new and you have no source data to populate it. You'd like to add 10,000 rows with random values to the different columns on this table. In most cases your only choice is to insert rows using a DB2 table editor or SPUFI. The row is then repeated 10,000 times with the only change being the key columns. In this case the EMPNO column. 

Using recursive SQL, you can, with very little effort, populate this table with as many rows as you like with random values for the different columns. I recommend writing the query to populate just one column at a time, then combine the SQL used to generate the different columns into one statement. This makes it much easier to debug syntax errors while you develop the data generation algorithm for each column.

Now let's break down each line of the query.

EMPID: Generate the EMPID with a random number between 1 and 10000.

     SELECT INTEGER(ROUND(RAND()*9999,0)) + 1                        

FNAME: Generate the FNAME with the first letter and every other letter as a consonant followed by a vowel. First, pick a random letter as the consonant, then concat to a random letter that's a vowel. Repeat this five times to concat a string of 10 characters. Here, I'll randomly truncate a string into a name 3-7 characters in length:

    <<< Repeat  times to build a string of 10 characters >>
   , INTEGER(ROUND(RAND()*4,0)) + 3)

LNAME: Repeat the same code that you have for FNAME, but change the random length
from 3-7 to 3-10:

INTEGER(ROUND(RAND()*7,0)) + 3)                

SALARY: Here I'll generate a random decimal between 1,000 and 5,000:

                  DECIMAL((1000.00 + RAND()*4000),7,2)                        

HIREDATE: Here I'll generate a date between 1-21 years from now:

                       CURRENT DATE - 1 YEAR - INTEGER(20*365*RAND()) DAYS         

Put it all together and you have this INSERT statement: Download code_sample_1.txt

Suresh's presentation shows other creative ways of using recursive SQL to resolve mathematical and business problems. I hope you'll take the time to read the entire thing. And if you come up with your own creative ways of using recursive SQL, please let me know.