A few weeks ago, I wrote about Job Trace, followed by an update on how you can instrument your applications for IBM i job trace. Job trace is but one of many tools used to collect diagnostic data on i.
The diagnostic tools integrated within the operating system provide the capability to collect detailed data that's required identify and resolve the root cause of software problems. The tools used for problem determination fall into two main categories -- the tools used for functional problems and the tools used for performance problems.
Tools used to resolve function problems -- whether the problem might be incorrect results, errors and exceptions, or unexpected failures -- are generally tools that support dumps, logs, and traces. On i, logs are generally in the form of messages; job logs, message queues and the history log are well known by i users.
Traces can be collected at the operating system level (aka, “above the MI”) or at the Licensed Internal Code level (aka, “below the MI”). There are different kinds of traces that are used depending upon what part of the system needs to be traced -- job trace, LIC trace, communications trace, trace TCP application and many more. “Flight Recorders” are a special always on type of trace that can be dumped out when necessary to start problem determination without recreating the issue.
Dumps are methods to collect detailed data, whether that data is for a job, an object or a data structure. Dumps may be used to collect internal information as well as application data details. Dumps can be initiated from commands and APIs, as well as the Display/Alter/Dump service function.
Performance tools provide the capabilities used to aid with the root-cause resolution of a performance problem; performance tools are generally used to identify why something isn't performing as it should, optimizing and tuning performance to keep it running well, and planning for future growth. The i performance tools tend to collect data at all the various layers of the system (the OS and LIC), so different tools are not required depending upon where the problem is suspected to be. The IBM i performance tools include Collection Services, Job Watcher, Disk Watcher and Performance Explorer. A future blog article will explore the different kinds of performance data collectors and when you would use each type.
The following simple diagram shows the primary diagnostic tools for functional or performance diagnostics and whether the tool is used for operating system or LIC diagnostics.Perhaps one of the biggest challenges is understanding all the tools that are available and when to use each one -- after all, there is a very rich set of diagnostic tools built into the operating system.
The second challenge is capturing the data needed for root cause problem resolution -- often times the required traces are not running the first time a problem occurs and the problem must be recreated with additional diagnostic tools enabled to required debug information is collected. That's why some of the capabilities built into the operating systems are always on, allowing for some level of troubleshooting without the need to recreate the problem. Messages, flight recorders, Service Monitor and Collection Services are all examples of features that IBM intends to have running all the time to enable this “First Failure Data Collection” -- the ability have the detailed information to begin diagnostic sleuthing the first time a problem occurs.
This rich set of diagnostic tools is a differentiator for IBM i; everything you need for diagnostics is built into the operating system. Analysis of that data, however, might present other challenges! On other platforms, diagnostic data is often simply written to flat files (called stream files on i) and are often just text strings, with little consistency from one piece of software to another on how the diagnostic data is collected. The development team for the i operating system has guidelines on how code should be instrumented for functional and performance diagnostics and the consistency these guidelines provide is a major value-add of i.
Two excellent Redbooks resources cover the diagnostic tools for i:
- i5/OS Diagnostic Tools for System Administrators: An A to Z Reference for Problem Determination
This redbook was written with the V5R4 release, but the information is still applicable to 6.1 and 7.1.
- End-to-End Performance Management on IBM i
This redbook covers everything you might want to know about performance management on i, including all of the performance tools that are built into the system and the basics of when and how to use each.