Blog
DB2utor

Categories

February 18, 2008

Are Dataset Extents Really a Performance Problem?

Last week I wrote about the DB2 secondary sliding scale space allocation feature in DB2 V8. DBAs often wonder about the impact of sliding scale allocation on their systems. Specifically, they want to know what the performance hit or CPU cost will be when they let their application run with table spaces and indexes in lots of extents.

From my testing, the number of extents has very little impact on the CPU usage. I did see a major impact on elapse time during the insert processing -- this takes a lot of extents. I assume this is DB2 waiting on IDCAM to build the data sets. I didn't see an impact to elapse time or CPU on SELECT or FETCH processing.

To analyze the impact of extents I asked three questions:

  1. What is the impact of going into multiple extents on a program that's doing insert processing
  2. What is the impact when reading from a table in extents and then inserting and updating another table that's growing in the number of extents?
  3. What is the impact on reporting type applications when the tables are in a large number of extents?

Answer to Question 1
To answer this question I used a batch COBOL program to read from a file and load an empty EMPL table using an SQL multi-row insert of 500,000 rows. From the data below I could see that going into extents doesn't impact CPU; however, it dramatically effects elapse time. So the key is to reasonably size the table and index space for most of your insert processing and have the safety of DB2 secondary sliding scale space allocation to reduce the possibility of your job failing due to a space problem.

Performance data to support Question 1--Job CSBPLNA loaded the single extent table space with an elapsed time of 1:45 and a total CPU of 39 seconds. Job CSBPE27A loaded with a final extent count of 236 on the table space with an elapsed time of 4:09 and a total CPU of 40 seconds. I assume that the increase of 137 percent in elapsed time is due to waiting on the allocation of each extent.

+-- RUN TIME --+-CPU UTILIZED-+ DB2

Step Start | Step DB2 %| Step DB2 %| CPU% Total

Jobname StepName Timestamp Elapsed of Et CPU CPU I Steps DB2 Cost

-------- -------- ----------- -------- ----- -------- ----- -- ----- --------

_ CSBPLNA STBLOAD 02/10 15:12 00:01:45 55.9 00:00:39 46.2 1 100.0 $ 18.25

_ CSBPE27A STBLOAD 02/10 17:08 00:04:09 83.2 00:00:40 46.5 1 100.0 $ 18.67

Answer to Question 2
To answer this question I used a batch COBOL program to read from the EMPL table and insert into an EMPL_HISTORY table. Then I updated a timestamp column on every row ( table space scan) in the EMPL_HISTORY table.

As with Question 1, my data showed that going into extents doesn't impact CPU, but dramatically effects elapse time. So your response is essentially the same--reasonably size the table and index space for most of your insert processing and have the safety of DB2 secondary sliding scale space allocation to reduce the possibility of job failure due to a space problem.

Performance data to support Question 2--Job DBP10N1D processed (single extent) table EMPL and EMPL_HISTORY with an elapsed time of 27:27 and a CPU time of 18:10. Job DBP10E2D processed (multiple extent) table EMPL and EMPL_HISTORY with an elapsed time of 46:41 and a CPU time of 18:46. This is an increase of 70 percent in the elapsed time. The EMPL_HISTORY table space did go into 227 extents as the primary index went into 249 extents. So the amount of time you wait on allocations varies.

+-- RUN TIME --+-CPU UTILIZED-+ DB2

Step Start | Step DB2 %| Step DB2 %| CPU% Total

Jobname StepName Timestamp Elapsed of Et CPU CPU I Steps DB2 Cost

-------- -------- ----------- -------- ----- -------- ----- -- ----- --------

_ DBP10N1D 02/10 15:24 00:27:27 88.1 00:18:10 83.7 1 24.2 $ 913.62

_ DBP10E2D 02/10 17:23 00:46:41 93.2 00:18:46 84.3 1 19.6 $ 950.84

Answer to Question 3

I was interested in the impact of updates when the entire table space must be read and processed across multiple extents. As you can see from this statement detail:

Job DBP10N1D processing the single extent table space performed the update with a total elapsed time of 1039.9 seconds and CPU of 847.1 Job DBP10E2D processing the multiple extent table space performed the update with a total elapsed time of 1801.1 seconds with a CPU time of 910.8.

This tells me that the update took 73 percent longer, with about a 4-percent increase in CPU. This number actually surprised me. The read processing isn't really impacted by the number of extents, so I guess the write process is impacted when you're in multiple extents.

SoftBase Systems Batch Analyzer Version 3.1.0

=== SQL Performance Analysis ===

Command ===> Scroll ===> PAGE

Filter: OFF Intervals: ON From: 02/07/08 To: 02/10/08 Loctn: DB8GLOC

<-

Stmt Stmt Stmt Stmt Execute DB2 CPU%

S JobName Plan Package Num Type Elapsed Cpu Count All Jobs

- -------- -------- -------- ----- ----- ------- ------- -------- --------

_ DBP10N1D SBTBATCH SBTU001 704 UPDAT 1039.9 874.1 200 95.6

_ DBP10N1D SBTBATCH SBTU001 635 INSRT 387.2 29.6 1000000 3.2

_ DBP10N1D SBTBATCH SBTU001 489 OPEN 23.2 9.9 4 1.0

_ DBP10N1D SBTBATCH SBTU001 555 FETCH 2.7 0.0 2000004 0.0

_ DBP10N1D SBTBATCH SBTU001 746 DELET 0.2 0.0 1 0.0

_ DBP10N1D SBTBATCH SBTU001 784 COMMT 0.0 0.0 1 0.0

_ DBP10N1D SBTBATCH SBTU001 604 CLOSE 0.0 0.0 4 0.0

_ DBP10E2D SBTBATCH SBTU001 704 UPDAT 1801.1 910.8 200 95.7

_ DBP10E2D SBTBATCH SBTU001 635 INSRT 776.2 30.2 1000000 3.1

_ DBP10E2D SBTBATCH SBTU001 489 OPEN 25.7 9.8 4 1.0

_ DBP10E2D SBTBATCH SBTU001 555 FETCH 3.4 0.0 2000004 0.0

_ DBP10E2D SBTBATCH SBTU001 746 DELET 5.9 0.0 1 0.0

_ DBP10E2D SBTBATCH SBTU001 784 COMMT 0.0 0.0 1 0.0

_ DBP10E2D SBTBATCH SBTU001 604 CLOSE 0.0 0.0 4 0.0

Performance data to support Question 3--Job CSBPRHN1 is processing the single extent EMPL_HISTORY table reading all the data. Job CSBPRHE2 is processing the multiple extent EMPL_HISTORY table reading all the data.

As you can see from the numbers below, the total elapsed time and total CPU don't change much by simply reading the data. So having the table in one or many extents just doesn't impact your CPU cost or your elapsed time.

SoftBase Systems Batch Analyzer Version 3.1.0

=== SQL Performance Analysis (Jobs) ===

Command ===> Scroll ===> PAGE

Filter: ON Intervals: ON From: 02/09/08 To: 02/10/08 Loctn: DB8GLOC

->

+--- RUN TIME ---+---CPU UTILIZED--+

CSBPR* Job Start | Job DB2 % | Job DB2 % | DB2 CPU% Total

Jobname Timestamp | Elapsed of Et | CPU of CPU | I all Jobs DB2 Cost

-------- ----------- -------- ------ -------- ------ -- -------- --------

_ CSBPRHN1 02/10 16:07 00:00:50 0.8 00:00:45 0.0 2 0.0 $ 0.01

_ CSBPRHN1 02/10 16:08 00:00:54 0.8 00:00:45 0.0 2 0.0 $ 0.01

_ CSBPRHE2 02/10 18:13 00:00:53 1.0 00:00:46 0.0 2 0.0 $ 0.01

_ CSBPRHE2 02/10 18:14 00:00:53 0.9 00:00:45 0.0 2 0.0 $ 0.01

An Insurance Policy
My expectations were that the elapsed times would increase with insert processing, while the elapsed and CPU times would remain flat with select and fetch processing. I didn't expect an increase in elapsed times for update processing when there are a large number of extents.

Based on my testing, I'd say you should always use DB2 sliding scale secondary allocation but continue to run reports that alert you when table spaces are in an excessive number of extents.

You'd then size the primary allocation to reduce the number of extents. Based on my initial test reducing the number of extents will help your insert and update processing.

Remember that sliding scale secondary allocation is like an insurance policy. It'll keep your applications up and running in both test and production.