A little over a year ago, IBM changed the handling of the inactive job time-out (QINACTITV) system value to improve accuracy and performance. That change is described in the blog article IBM Improves the Handling of the Inactivity Timeout.
IBM has made some additional improvements to the handling of QINACTITV. This time, the changes address two questions:
1. What does it mean
for an interactive job to be inactive?
With the change, a job can use CPU time and still be considered inactive.
2. Which jobs can be disconnected and
which jobs must be ended?
With the change, a job can be disconnected even if a device name was not
supplied when the session was created. What this means is that jobs for
devices whose names start with QPADEV can be disconnected. This affects
the Disconnect Job (DSCJOB) command processing and the inactivity timer
processing, but it does not affect the handling of device errors.
The change is available only for the 7.1 release with PTF SI50502. This is a delayed PTF, so it requires an IPL to put it on or to take it off.
First, let's look at what it means for a job to be inactive.
- Is there I/O occurring?
We first look at the session – the connection between the user and the system. A session is inactive if there is no input or output occurring. A session is active if the system continues to send data to the user even if the user never sends input to the system. - Who is waiting?
A session will be inactive while the system is working on a long-running command, because there is no I/O occurring. In order for the job to be considered inactive, the system must be waiting for input from the user. The Work with Active Jobs (WRKACTJOB) command shows a status of DSPW for a job that is waiting for input from a workstation. - Is
the job using CPU?
Looking at whether the job uses CPU time can be both helpful and harmful. Let’s look at some examples and how the new PTF affects these examples.
The job might do work that comes from sources other than the workstation. For example, the job may have a message queue allocated and a break message handling program defined. In this case, using CPU indicates that the job is doing work and is in some sense active. With the new PTF, running a break message handling program will no longer be a reason to consider the job active. The new code gives less weight to unpredictable, background activities.
The job might do work that is just a part of being there and interacting with other jobs. The job uses CPU time, but there is no reason to consider the job to be active. For example, when some other job looks at the invocation stack of an interactive job, the interactive job does some of that work and is charged with the CPU time it uses. This might occur when you have the inactive job message queue (QINACTMSGQ) system value set to the name of a message queue. The program that handles the CPI1126 message might want to look at an invocation stack to decide if a job should be considered inactive. With the new PTF, looking at an invocation stack will no longer be a reason to consider the target job to be active.
More and more things are being done in the background, often in secondary threads. Using JAVA code in an interactive job is one way to see CPU time used just for being there and interacting with the environment.
There is no perfect way to decide whether a job should be considered active. A program handling CPI1126 messages can decide what to do with a job that the system considers inactive, but it is not notified about a job that the system considers active. The PTF cover letter describes a way to use an environment variable to change how the system treats CPU time for determining inactivity, but it cannot solve all the potential problems. The session will look active when someone sends a message to the workstation message queue, even if the user for that workstation isn't there.
Now on to the question of which jobs can be disconnected.
In order for a Disconnect Job (DSCJOB) to make sense, there has to be the capability to reconnect to the job. In order to reconnect to the job, the same user must sign back on to the same device description.
For the case where DSCJOB is being done in response to a device error, DSCJOB is not allowed for sessions where no specific virtual device is requested and the system selects which device to use. The session is gone and it is very unlikely for that user to get the same device when the user next connects to the system or when the device is next used.
For inactivity, the session is still intact and connected to the same user's workstation. The DSCJOB command is used by programs that handle the CPI1126 message when the QINACTMSGQ system value names a message queue. That means the command should act the same way the inactivity timer code acts. Also, if a user wants to DSCJOB rather than SIGNOFF, this is probably a good thing to allow.
This PTF allows more jobs to be disconnected. If your system is set up with QINACTITV(*NONE) or QINACTMSGQ(*ENDJOB), this PTF will probably have very little effect on the workload. So, for discussion, we'll assume that jobs are being disconnected for inactivity.
There is a cost to start a new job, to end a job, to disconnect a job, or to reconnect to a job. Different applications will have different costs. While the overall amount of work is important, many systems are more strongly affected by how that work is distributed over time. Disconnected jobs continue to hold locks and use system resources. This is also true of inactive jobs.
- When a user signs on and is reconnected to a disconnected job, the system avoids the cost of ending the old job and creating a new job. When a large number of users reconnect, the work is generally well spread out over time.
- When a user does not reconnect before the time limit defined by the time interval before disconnected job ends (QDSCJOBITV) system value, the disconnected job is ended. By comparison to QINACTMSGQ(*ENDJOB), the system sees the extra work of a disconnect, but there is no work avoided. All jobs see the same QDSCJOBITV value, so the work is spread out the same way the work for QINACTITV is spread out.
- When a user does not reconnect and the subsystem is ended before the QDSCJOBITV time limit, the work of ending the disconnected jobs gets done all at once during the ending of the subsystem. This can significantly increase the stress on the system. Jobs that are inactive and not disconnected are also ended during the ending of the subsystem.
The QDSCJOBITV system value should be set high enough to allow users enough time to reconnect to their disconnected jobs, but low enough that the disconnected jobs are likely to be ended before the subsystem ends. If users are not going to reconnect, the QINACTMSGQ should be set to *ENDJOB rather than *DSCJOB.
One of the common problems with QINACTITV occurs when a user returns to a session at the same time the system is checking for inactivity. The system does not know that the user is there until the system sees input on that session and the system only sees input when the user presses enter or a function key.
If QINACTMSGQ is set to *DSCJOB and the user is using a virtual device selected by the system (a QPADEVxxxx device), the job will now be disconnected rather than ended. The user can sign on and continue. The PTF should be very helpful, even though the handling of QINACTITV will never be perfect.
I’d like to thank Dan Tarara for writing this blog. Dan is a member of the IBM i Work Management development team. Thanks, Dan!
We've had an issue for over a year involving QINACTITV and Java. There was a change made by IBM in 2012 that forced the JVM to become active at certain intervals. This invalidates the use of QINACTITV because the job never times out as inactive. According to many discussions with support this is working as designed and there are no plans to change it.
Posted by: Paul Fenstermacher | July 24, 2013 at 01:55 PM
Sounds like some good thinking done here but that bit about break handling programs will require some thinking as this PTF is applied......
Posted by: DrFranken | July 25, 2013 at 12:28 PM
This change should help with the problem caused by Java consuming CPU time. Java is part of the reason that the change is enabled by default.
Posted by: Dan Tarara | July 25, 2013 at 01:16 PM
Oops, details. Obviously, if the break message reaches the screen, there is I/O and therefore activity. I keep forgetting that the message delivery mechanism does a suspend/restore of the screen around the message handling. That is also I/O so even when the message gets handled by the program it looks active. Better example would have been a signal processing procedure or a job interrupt event. Tried to use a more common example and messed up on accuracy. Sorry about that. Still requires thought about what might have previously been considered active but will now be considered inactive.
Posted by: Dan Tarara | July 25, 2013 at 02:03 PM