Blog
DB2utor

Categories

April 15, 2008

Performance Comparison: INNER JOIN vs. IMPLICIT JOIN

Just as programming standards help with the maintenance and performance of programs written in COBOL or Java, SQL standards exist to help ensure overall quality and performance of the systems running these statements.

One standard I've seen is the requirement to write the join between two tables using the INNER JOIN syntax rather than the older implicit format, where a comma is used between table names.

I was told that this standard was implemented because INNER JOIN processing is faster than IMPLICIT JOIN processing, but I didn't believe it, so I set out to test it for myself. I used the sample tables that come with DB2, except I loaded up the EMP table with 500,000 rows to add some volume.

The following is an example of using IMPLICIT JOIN to retrieve data from both DSN8810.EMP and dSN8810.DEPT. My EMP table contained 500,000 rows. I decided to run the benchmark with SELECT of 10, 1,000 and then 100,000 rows, using the FETCH FIRST n ROWS ONLY.

To simplify the SQL I used "*" instead of the specific column names. I'm sure your shop has a coding standard that would never allow a SELECT "*," but would require the specific columns needed to be selected.

In the IMPLICIT JOIN format, you list the tables that need to be processed. As noted, each table is separated by a comma. For example:

SELECT *
FROM DSN8810.EMP E
, DSN8810.DEPT D
WHERE E.WORKDEPT = D.DEPTNO
FETCH FIRST 10 ROWS ONLY

Here's the same SQL written in the the newer format (I believe it became available with DB2 version 4.1), INNER JOIN:

SELECT *
FROM DSN8810.EMP E
INNER JOIN
DSN8810.DEPT D
ON E.WORKDEPT = D.DEPTNO
FETCH FIRST 10 ROWS ONLY

Before each test I stopped the table space to flush the bufferpool. Then I ran the SQL statement three times and took the average. The repetition is necessary because the first time you read the data, it takes time to load up the bufferpool. By the time you execute the statement the second and third times, the data is already in the bufferpool.

Notice in my numbers below that the second and third exection times dropped a lot when I had small amounts of data. However, on the third run with 100,000 rows, the numbers didn't change much -- that's because my bufferpools weren't configured to be large enough to hold all the data, and DB2 had to re-read the data into the bufferpool.

DB2 V8 IMPLICT JOIN

                                             Cost-
ROW CNT    Run-Time   OPEN-time  FETCH-time  Factor
-------    --------   ---------  ----------  ------
    10     1.620602   0.000024   1.620466         0
    10     0.001805   0.000023   0.001675         0
    10     0.001800   0.000024   0.001672         0

  1000     1.685502   0.000023   1.684478        39
  1000     0.102962   0.000023   0.101976        39
  1000     0.107093   0.000028   0.106101        39

100000    30.279181   0.000032  30.248101      3944
100000    28.553547   0.000026  28.542620      3944
100000    28.166783   0.000025  28.166652      3944

DB2 V8 INNER JOIN

ROW CNT    Run-Time   OPEN-time  FETCH-time  Factor
-------    --------   ---------  ----------  ------
    10     1.619768   0.000028   1.615628         0
    10     0.002604   0.000023   0.001675         0
    10     0.018162   0.000032   0.001684         0

  1000     1.696460   0.000023   1.693338        39
  1000     0.106423   0.000022   0.106303        39
  1000     0.107307   0.000025   0.107174        39

100000    29.895222   0.000027  29.891819      3944
100000    29.058716   0.000025  29.057679      3944
100000    28.607622   0.000024  28.607496      3944

DB2 9 IMPLICT JOIN

ROW CNT    Run-Time   OPEN-time  FETCH-time  Factor
-------    --------   ---------  ----------  ------
    10     2.108147   0.000028   2.107066         0
    10     0.001851   0.000025   0.001712         0
    10     0.001845   0.000025   0.001700         0

  1000     2.247287   0.000027   2.247138        39
  1000     0.107628   0.000027   0.107460        39
  1000     0.107628   0.000027   0.107460        39

100000    31.791275   0.000026  31.787794      3940
100000    29.196265   0.000027  29.196113      3940
100000    28.731217   0.000027  28.731072      3940

DB2 9 INNER JOIN

ROW CNT    Run-Time   OPEN-time  FETCH-time  Factor
-------    --------   ---------  ----------  ------
    10     2.113345   0.000026   2.113198         0
    10     0.001749   0.000025   0.001609         0
    10     0.003650   0.000029   0.001656         0

  1000     2.226791   0.000031   2.222058        39
  1000     0.111683   0.000025   0.111533        39
  1000     0.110913   0.000026   0.110759        39

100000    31.367625   0.000024  31.361684      3940
100000    28.781739   0.000027  28.781571      3940
100000    29.039949   0.000024  29.039778      3940

The average run-time duration for 100,000 rows for DB2 V8 is 29.00 for IMPLICIT JOIN and 29.19 for INNER JOIN. For DB2 V9, the numbers are 29.90 and 29.73, respectively.

In DB2 V8 the run-time is a little faster using IMPLICIT JOIN, while INNER JOIN is a little faster with DB2 V9. These numbers are close, and can change based on what's going on with DB2 at the time the statements executed.

The numbers from this exercise show that using INNER JOIN over the IMPLICIT JOIN syntax won't give you a performance improvement. This may have been the case in DB2 V4 or V5, but as you can see the optimizer is getting very smart and will rewrite statements to be in the best format to achieve good performance.