June 19, 2012

DB2 10 Time Travel Queries

A new DB2 10 feature is designed to make it easier to determine what data is active data based on a specified timeframe (application temporal) and track changes to that data (system temporal). To enable these "time travel" queries, DB2 10 supports a new type of table -- temporal -- as well as new SQL syntax that simplifies design and coding around these types of applications.

A table is a business_time temporal table if it has a column defined as the start_period business_time and end_period business_time. (Columns can also be defined as start_period system_time and end-period system time.)

The DB2 catalog table SYSIBM.SYSCOLUMNS will indicate if a column on a table is defined to be a PERIOD column. In such cases, the column on the table is PERIOD and has these values:

B - Column is the start of period BUSINESS_TIME.
C - Column is the end of period BUSINESS_TIME.
S - Column is the start of period SYSTEM_TIME.
T - Column is the end of period SYSTEM_TIME.

Application-period temporal tables are useful when data is changing and the changes must be active at given points in time. Some examples are table entries like product descriptions, pricing and contract dates. System-period temporal tables are useful when data changes must be saved and available for auditing purposes.

Bi-temporal tables support both application and system functionality. That is, new descriptions, contracts and pricing can be added to a table and given a date to become active. When I go back and edit a row in the table the original row is copied to a history table. I can then go back and query the history table to see when changes had been made to the data.

 Last month I discussed Craig Mullins' book and referenced the section on temporal tables.

I’ve recently been doing some research on time travel queries and found a good video at Channel DB2 by Clara Liu. "DB2 10.1: Time TravelQuery" explains how queries work both when processing based on business time and system time:

Another good resource is Dan Luksetich's recent article in the Enterprise Tech Journal on leveraging DB2's automated time travel queries and temporal tables.

One thing about Dan's article that stood out to me is the performance implications of processing system-period temporal base tables and history tables. These tables should be queried directly. If you need current data, query the base table; if you need history, query the history table. Otherwise, DB2 queries both to locate the data.

It gets even more complicated when joining system-period temporal tables. Dan mentions the number of tables accessed when SYSTEM_TIME is specified as (2n) X n (where n is the number of tables in the join). So a two-table join with each table having its own system-period time specified would result in eight tables being accessed. A three-table join would be 24 tables, and a four-table join would be 64 tables accessed. Clearly, system-period joins with each table having its own period specified should be avoided.

Have you used time travel queries? Please share your experiences in Comments.