Blog
DB2utor

Categories

July 22, 2014

Finding the Cause of a Deadlock or Timeout

Pinpointing the cause of an application failure is one of the more challenging aspects of a developer's job. Locking problems can be greatly reduced by designing applications that support checkpoint restart and provide the capability to tune commit frequency. Of course, applications failures still happen. Here's some quick advice for dealing with this issue.

Recently I was asked how to find the cause of a deadlock or timeout that led to an application failure with an SQL return code of -911 or -913. First, check the system log and the DB2 master address space log in which the application executed on -- DB2 writes messages to both. (Note: If you're not authorized to view messages in the system and DB2 master address space logs, you'll need access from an authorized user. Another option is to use a performance monitoring tool that provides access to DB2 messages.)

The message you're looking for is either a DSNT375I (for a deadlock) or a DSNT376I (for a timeout.) The key information provided is the plan names involved, along with thread identifying information.

            Deadlock message DSNT375I sample:

            PLAN=plan-name1 WITH CORRELATION-ID=correlation-id1 CONNECTION-ID=connection-id1       LUW-ID=luw-id1 THREAD-INFO=thread-information1
               IS DEADLOCKED WITH PLAN=plan-name2
                   WITH CORRELATION-ID=correlation-id2    CONNECTION-ID=connection-id2
                    LUW-ID=luw-id2 THREAD-INFO= thread-information2 ON MEMBER member-name

For details on this message, click here.

            Timeout message DSNT376I sample:
            PLAN=plan-name1 WITH CORRELATION-ID=correlation-id1
            CONNECTION-ID=connection-id1 LUW-ID=luw-id1 THREAD-INFO=thread-information1
                IS TIMED OUT. ONE HOLDER OF THE RESOURCE IS PLAN=plan-name2
                    WITH CORRELATION-ID=correlation-id2 CONNECTION-ID=connection-id2
                    LUW-ID=luw-id2 THREAD-INFO= thread-information2 ON MEMBER member-name

For details on this message, click here.

Once you pinpoint the cause of a deadlock or timeout, you may want to know the name of the object that's involved in the failure. Next week I'll tell you how to find an object's resource name and determine the cause of the contention.