The inactive job time-out (QINACTITV) system value defines the minimum amount of time that an interactive job can be inactive before an action is taken. The action that is taken is defined by the inactive job message queue (QINACTMSGQ) system value. A related system value, time interval before disconnected job ends (QDSCJOBITV), defines how long a job can be disconnected before the job is ended.
The handling of inactive job time out (QINACTITV) is being changed with a 7.1 PTF (the other system values are unaffected) to address two long-standing problems.
- Accuracy. Prior to this change, a job can be inactive for up to twice the specified interval before an action is taken. For inactivity intervals measured in hours, this level of uncertainty is simply too high.
- Performance. A large number of jobs can be ended at once when the subsystems check for inactivity. This can affect overall system performance at the time the jobs are being ended.
The change is available only for the 7.1 release with PTF SI46398. This is a delayed PTF, so it requires an IPL to put in on or to take it off.
First of us, let's review this change as it pertains to accuracy.
Prior to this change, the subsystem would wait for the QINACTITV number of minutes between the times it checked for inactive interactive jobs. Jobs can be inactive for nearly two intervals before the inactivity is detected; if a job runs just after the system checks for inactivity, at the next check it will not be inactive for the full interval, so the timeout will not occur until the following interval. For example, if you have QINACTITV set to 60 minutes, jobs may need to be inactive for nearly 120 minutes for the inactivity to be detected.
The PTF improves the timeliness in which inactivity is identified.
The QINACTITV function is still enforced by the subsystem code periodically checking for inactivity, but with the change, the subsystem tries to check more often to improve the accuracy. In most cases, the subsystem will now check for inactivity every 10 minutes and will keep track of how long a job has been inactive. If the QINACTITV system value is less than 15 minutes, the subsystem will continue to use the value specified by QINACTITV. (Checking for inactivity every 10 minutes only helps for the cases where QINACTITV is greater than or equal to 15 minutes).
A job can still be inactive for longer than the QINACTITV system value. For any given QINACTITV, you can work out an exact value for how long a job can be inactive before an action is taken. The job becomes inactive sometime during the 10 minutes between checks. So, in the simple case where the QINACTITV value is evenly divisible by 10, a job can be inactive for up to 10 minutes plus the QINACTITV value. But the QINACTITV value does not have to be evenly divisible by 10 and the subsystem cannot take action until the next time it checks for inactivity. For an QINACTITV value such as 101, a job can be inactive for at most the QINACTITV time plus 19 minutes.
Consider an example where QINACTITV is set to 90 minutes. With the old code, the subsystem would take the action defined by the QINACTMSGQ system value if a job was inactive anywhere from 90 minutes to 180 minutes. With the new code, the subsystem will take action if a job has been inactive anywhere from 90 minutes to 100 minutes.
Consider another example. In this case QINACTITV is set to 45 minutes. With the old code, the subsystem would take action if a job was inactive anywhere from 45 minutes to 90 minutes. With the new code, the subsystem will take action if a job has been inactive anywhere from 50 minutes to 60 minutes.
A job will be inactive for at least the QINACTITV value before action is taken. But sometimes you might be interested in "at most" rather than "at least". If you have a requirement that jobs must not be left inactive for longer than some specified value, you need to set the QINACTITV value to something less than the limit that you want to enforce. With the old code, the QINACTITV value had to be half of the limit that you wanted to enforce. With the new code, the QINACTITV value can be much closer to the limit that you want to enforce.
If system value QINACTMSGQ is set to the name of a message queue and the job identified by the CPI1126 message remains inactive and is not ended or disconnected, another CPI1126 is sent the next time the subsystem checks for inactivity. This behavior is unchanged, but the subsystem may be checking for inactivity much more often. This could affect the logic of whatever program is handling messages sent to that message queue. The program will have a lot less time before it gets another CPI1126 message for a given job.
It can still be useful to set the QINACTMSGQ (Inactive job message queue) system value to the name of a message queue so that you can control which jobs get ended or disconnected and how many jobs get processed at a time. Activity can be uneven and can be influenced by external factors such as lunchtime. A significant number of jobs can become inactive within the same 10-minute interval. This is less of a problem with a 10-minute interval than it is with an interval measured in hours. Also, due to other changes in the code, this is less of a problem when the interactive work is spread across multiple subsystems.
Now on to the performance improvements provided by this PTF.
Subsystems that are started at the same time will no longer check for inactivity at the same time. When the subsystems start or when the QINACTITV system value is changed, the subsystems now use a small but variable delay when setting up the checking for inactivity. This helps reduce the number of jobs that are disconnected or ended at a single time. With the old code, you could introduce delays between starting subsystems and these delays would translate more or less directly into delays between the times the different subsystems would check for inactivity. With the new code, every subsystem has to check every 10 minutes, but the system takes care of spreading out the work. Now, the only reason to have delays between starting subsystems would be to manage the amount of work done when starting the subsystems.
When a subsystem sets up its checking for inactivity, it has no idea how many other subsystems will need to do this same type of work. In addition to checking for inactive interactive jobs, subsystems also check the number of prestart jobs that are not being used. PTF SI46398 spreads out the work for ending of inactive interactive jobs and for ending of unused prestart jobs.
Because it is still the subsystem that checks for inactivity, much remains the same. But the system now reacts to inactivity sooner and spreads out the work better. This makes the enforcement of QINACTITV more predictable and reduces the impact on overall system performance.
I'd like to thank Dan Tarara for this blog; Dan is a member of the IBM i work management team. Thanks, Dan!
This is a fantastic improvement that will make the inactive timeout more precise.
Now is there any chance we can get the work management team to disconnect an inactive job if it is on a device name that starts with QPADEV? At least give us an option to do it? I've had it explained to me about 5 times in 10 years but it makes no sense why the inactivity timer can end a job but not disconnect it just because the device name starts with QPADEV.
Thanks for the consideration.
Posted by: John Knox | April 18, 2012 at 02:43 PM
There has been some discussion about allowing DSCJOB for QPADEVxxxx devices.
For the case where DSCJOB is being done in response to a device error, the restriction would probably still be there. The session is gone and it is very unlikely for the same user to get that device when the device is next used.
For inactivity, the session is still intact and connected to the same user's workstation. Similarly, if a user wants to DSCJOB rather than SIGNOFF, this is probably a good thing to allow.
I don't know if this sort of change will ever get done (it is not currently "in plan"), but there is a chance.
Posted by: Dan Tarara | April 19, 2012 at 11:23 AM