The Work Management team recently released PTF SI42845 for the 7.1 release that changes how IBM i manages jobs that exceed their CPU or storage limits.
The class object defines the processing attributes for a job. The routing entry in the subsystem description is used to determine which class object is used when a job is initiated. Two of these processing attributes within the class object are Maximum processing unit time (CPUTIME) and Maximum temporary storage allowed (MAXTMPSTG), which both have default values of *NOMAX. Prior to this recent PTF, if values were entered for these parameters, the job would be ended if one of the limits was hit. For the maximum processing unit time, the job would be ended with CPC1218 (Job ended abnormally); for the maximum temporary storage allowed, the job would be ended with CPC1217 (Job ended abnormally). The cause for each of these messages tells you whether the job ended abnormally due to the maximum CPU time being consumed or the maximum temporary storage limit being exceeded.
The system can’t know if the job was actually near the completion of the work it had to do when it would end the job. It's possible that given a little more CPU time or temporary storage, the job would be able to run to completion. Because of the difficulty in predicting the upper CPU or temporary storage limits required by a job, along with the fact that the job would be ended when these limits were hit, many customers simply left these values at their default setting.
The above PTF that was recently released changes the behavior so that jobs are no longer ended when they have exceeded their maximum processing unit time or their maximum temporary storage limit. Rather, the jobs will be held. When a job is held by the system due to these conditions, a message will be sent to the QSYSOPR message queue:
• CPI112D – Job held by the system, CPUTIME limit exceeded
• CPI112E – Job held by the system, MAXTMPSTG limit exceeded
This change allows the system operator to determine whether the jobs should be ended or if they should be allowed to continue to run to completion.
If you want the jobs to continue to run, you must change the limit that was hit and then use the Release Job (RLSJOB) command (you can’t release a job that’s above the limit). To allow these values to be changed, the Change Job command and the Change Job APIs have been enhanced.
The Change Job (CHGJOB) command has been enhanced with two new parameters:
• Maximum CPU time (CPUTIME): The maximum CPU time parameter specifies the maximum processing unit time (in milliseconds) that the job can use. If the maximum time is exceeded, the job is held.
• Maximum temporary storage (MAXTMPSTG): The maximum temporary storage parameter specifies the maximum amount of temporary auxiliary storage (in megabytes) that the job can use. This temporary storage is used for storage required by the program itself and by implicitly created internal system objects used to support the job. (It doesn’t include storage for objects in the QTEMP library.) If the maximum temporary storage is exceeded, the job is held.
The Change Job (QWTCHGJB) API has been enhanced to support two new keys on the JOBC0100 and JOBC0200 formats:
• Maximum processing unit time allowed, in milliseconds (1302)
• Maximum temporary storage allowed, in megabytes (1305)
This PTF makes it easier for you to protect your system from the effects of a run-away job that either consumes more CPU than expected or uses more temporary storage than expected. By setting these limits larger than what any job should use, you can protect the system from the potentially negative affects of a run-away job. Because the job will be held rather than ended, the limits don’t need to be set perfectly. If either limit is hit, you can increase the limit with the change job command or API then release the job to allow it to continue to run. If the new upper limit is hit, the system will once again hold the job.
With the change introduced with this PTF, you should start to move away from the default *NOMAX values and set appropriate limits. Particularly with the temporary storage limit, you can prevent a system outage by setting an upper limit on the class object for the maximum temporary storage that a job can use (but be sure to keep that limit lower than the amount of storage available on the system). With the new behavior of the job being held when the limit is hit, you have the capability to assess and determine the best action for the job.
I'd like to thank Dan Tarara from the IBM i work management development team for his assistance in writing this blog article.
How does one determine what would be an appropriate Maximum CPU Time in Milliseconds and Maximum Temporary Storage in Megabytes should they decide to change the default class object settings from *NOMAX?
I would assume you would make different changes based upon whether the class object was for batch versus interactive job processing.
Could you expound on this subject?
Posted by: Harry Tolzman | April 21, 2011 at 07:50 AM
Knowing that you can set the limits is the easy part; knowing what the limits should be set to is the hard part!
You need knowledge of your environment to know what is typical for the CPU time and temporary storage used by your jobs. You can use the various work management interfaces to review these metrics; Collection Services also has the CPU time used for jobs.
Knowing the temporary storage upper limit is a bit difficult since the interfaces that return information on temporary storage used only provide it for active jobs and the value returned is the current amount used; you can't determine the upper limit of the temporary storage used by a job after it has ended nor can you determine the high-water mark of the temporary storage used by a job.
Regarding the temporary storage limit, it is best to think of this limit as protecting the system, not limiting an individual job. You may want to set the temporary storage limit based upon the total amount of storage on the system and set the limit so that no job uses more than 5% of the maximum storage available. Work with System Status (WRKSYSSTS) can show you now much storage is available. The bottom line regarding the temporary storage used limit is a large limit (but smaller than the total amount of storage available) is much better than *NOMAX, as the large limit will protect the system from a failure whereas *NOMAX does not. A maximum temporary storage limit of *NOMAX is dangerous and I would advise everyone to move away from that default value.
A very quick way to get rid of *NOMAX for the temporary storage used values is to do a WRKCLS *ALL/*ALL and put a 2 on every line and a MAXTMPSTG(xxxxx) on the parameter line.
The nice thing about this new support is that you do not have to accurately set these limits at first - you can experiment with the settings and if jobs are hitting them because they are too low, you can increase the limit for the job and then release the job to allow it to continue to run; you can then modify the class parameters as you determine what the best settings are for your environment.
Once you set these limits, you should pro-actively monitor the QSYSOPR message queue for the new messages that are sent to ensure you promptly react if the system holds a job. You can automate this message monitoring with Management Central Message Monitors or Message Watches.
Dawn
Posted by: Dawn | April 22, 2011 at 10:43 AM
A Case:-
Job has MAXTMPSTG as *NOMAX.
1. Could be a chance that job still ends, when it hits Max limit?
2. How to determine *NOMAX for a system with respect to temperoray storage?
3. Temporary Storage for a Job and Heap have any relation?
Posted by: Anoop Dashora | January 01, 2012 at 02:34 AM
Hi Anoop, here are the answers to your questions:
1. If the MAXTMPSTG setting for a job is *NOMAX, there is no max limit to hit and the system would never end the job based upon temporary storage consumption (unless all of the system's storage is consumed, but that would result in a the whole partition being affected, not just a single job).
2. With the PTFs that have been released for 7.1, a best practice would be to never use *NOMAX, but rather identify an upper limit for the MAXTMPSTG value. As the article explains, what this limit is will depend upon your system's storage - a large value that is smaller than the total amount of storage available.
3. Heap storage that can be allocated and deallocated by programs is temporary storage that is tracked on a job basis.
Dawn
Posted by: Dawn | January 05, 2012 at 04:53 PM
Hi Dawn,
Is there a PTF equivalent of PTF SI42845 for IBM i6.1?
Thanks
Posted by: Argel | January 16, 2012 at 01:22 AM
Hi Argel,
No, this PTF is only available for 7.1. I don't expect to see this same functionality back on 6.1. Sorry.
Dawn
Posted by: Dawn | January 18, 2012 at 04:58 PM
Hi,
What if we WANT the job to end if it exceded the max cpu? Can this be specified in the class?
Posted by: Greg Stahl | May 02, 2014 at 08:58 AM
If you want the old behavior where the job will be ended if the maximum CPU or temporary storage limits are held, read this blog:
http://ibmsystemsmag.blogs.com/i_can/2014/07/how-to-end-jobs-that-are-now-held-for-maximum-cpu-or-temporary-storage-usage.html
Posted by: Dawn May | September 02, 2014 at 06:15 PM