Finding the Cause of a Deadlock or Timeout
Pinpointing the cause of an application failure is one of the more challenging aspects of a developer's job. Locking problems can be greatly reduced by designing applications that support checkpoint restart and provide the capability to tune commit frequency. Of course, applications failures still happen. Here's some quick advice for dealing with this issue.
Recently I was asked how to find the cause of a deadlock or timeout that led to an application failure with an SQL return code of -911 or -913. First, check the system log and the DB2 master address space log in which the application executed on -- DB2 writes messages to both. (Note: If you're not authorized to view messages in the system and DB2 master address space logs, you'll need access from an authorized user. Another option is to use a performance monitoring tool that provides access to DB2 messages.)
The message you're looking for is either a DSNT375I (for a deadlock) or a DSNT376I (for a timeout.) The key information provided is the plan names involved, along with thread identifying information.
Deadlock message DSNT375I sample:
PLAN=plan-name1 WITH CORRELATION-ID=correlation-id1 CONNECTION-ID=connection-id1 LUW-ID=luw-id1 THREAD-INFO=thread-information1
IS DEADLOCKED WITH PLAN=plan-name2
WITH CORRELATION-ID=correlation-id2 CONNECTION-ID=connection-id2
LUW-ID=luw-id2 THREAD-INFO= thread-information2 ON MEMBER member-name
For details on this message, click here.
Timeout message DSNT376I sample:
PLAN=plan-name1 WITH CORRELATION-ID=correlation-id1
CONNECTION-ID=connection-id1 LUW-ID=luw-id1 THREAD-INFO=thread-information1
IS TIMED OUT. ONE HOLDER OF THE RESOURCE IS PLAN=plan-name2
WITH CORRELATION-ID=correlation-id2 CONNECTION-ID=connection-id2
LUW-ID=luw-id2 THREAD-INFO= thread-information2 ON MEMBER member-name
For details on this message, click here.
Once you pinpoint the cause of a deadlock or timeout, you may want to know the name of the object that's involved in the failure. Next week I'll tell you how to find an object's resource name and determine the cause of the contention.
Connect With Us: