In V5R3, IBM i introduced watches as a way to automate the ending of traces. In 5.4, watches were supported independent of the trace commands.
Watches provide a way to automate tasks when certain events occur. An event can be a message, a Licensed Internal Code (LIC) log (also known as a VLOG), or, in 6.1, a Problem Activity Log (PAL) entry. The primary motivation for adding watches to the operating system was initially to provide a way for improved diagnostics, but watches, particularly message watches, can be used for automated monitoring of system conditions. Watches provide a way to be notified programmatically when the event occurs so immediate action can be taken. Additionally, watches can be very useful at detecting situations that occur intermittently since the actions can be automated.
Watches have minimal system overhead when they’re defined, but not “hit.” When the watch condition occurs, the actions taken depend upon the how the watch is defined. When using watches independent of traces, the actions taken are what the watch exit program is coded to do.
To start a watch, use the Start Watch (STRWCH) command or QSCSWCH API. To end a watch, use the End Watch (ENDWCH) command or QSCEWCH API. You can also use a Work With Watches (WRKWCH) command to display and set actions on watches. Up to 10,000 watches can be active at one time.
Watching for LIC logs or PAL entries is probably something you won't need to do, but watching for messages can be an effective way to perform proactive and automated monitoring of system conditions. You can watch for messages sent to any message queue, the history log or job logs. When you define the watch condition, you can specify several different conditions you want to check for to limit the situations under which the watch will trigger. When the watch condition is matched, your exit program will get control and you can take whatever actions you deem appropriate for the condition.
Using watches to automate monitoring of messages is much more efficient than using the Management Central Message Monitors. Management Central Message Monitors use a polling technique to retrieve messages, which has more overhead and is less timely than watches. With Management Central Job Monitors, you can monitor for job log messages, however, monitoring job log messages with Management Central Job Monitors is expensive in terms of the system resources used due to its polling nature. Message watches are a great alternative since they provide essentially the same support with significantly fewer system resources used. However, it isn’t as easy to set up Message Watches since there’s no GUI for watches; only command and API interfaces have one. In addition, with watches you have to write your own watch exit program, whereas Management Central has a nice interface to define the actions taken when the monitored condition occurs. IBM has provided an example watch exit program in the Information Center.
When watches are defined, additional jobs run on the system. These jobs are different on the various releases. On 5.4, there’s one batch job per watch in the QUSRWRK subsystem. These jobs run until the watch is ended. In 6.1, watches were changed to use batch prestart jobs. There’s a single batch job, QSCWCHMS, in QUSRWRK for message watches; when a message watch condition is hit, the user exit program is called in a prestart batch job, QSCWCHPS. This change in 6.1 reduces the number of jobs in the QUSRWRK subsystem.
IBM i has a function called Service Monitor that uses watches as its notification mechanism. Service Monitor was added in the 5.4 release. Service Monitor is controlled by the QSFWERRLOG system value; when this system value is set to *LOG, service monitor will be active. Service Monitor uses many watches, and you’ll see evidence of this by the jobs running in QUSRWRK, as well as when you work with watches. I'll talk more about Service Monitor in a future blog article.