However, workstation devices are still widely used today; telnet sessions all use workstation devices.
The subsystem job is central to workstation-device management. It handles putting up the sign-on display, as well as error-recovery processing if the session ends unexpectedly. Prior to 5.4, subsystem jobs were single-threaded and could only process one device at a time. Thus, if you had a situation where many devices were affected at one time (a network outage, for example), the subsystem job could become a bottleneck for the device error recovery processing. As a circumvention for this issue, IBM made a general recommendation that no more than 250-300 devices be handled by a single subsystem. To implement this recommendation, you had to define multiple subsystem descriptions and set up the necessary workstation entries to spread the devices across those subsystems. We wrote the Interactive Subsystem Configuration experience report to describe in detail how to perform this configuration.
In the 5.4 release, the subsystem job architecture was changed to be multithreaded; now a single subsystem job could handle device-error processing for up to 20 devices at a time, in 20 different threads. This architecture change had benefits beyond the parallel processing for device error recovery; it also resulted in subsystem start-up time being faster.
I want to note, though, that the controlling subsystem job isn’t multithreaded. If you leave the controlling subsystem system value (QCTLSBSD) to the default of QBASE, your interactive users will run in the controlling subsystem and you won’t get the benefits of multithreaded subsystem jobs. If you have many interactive users, you should change the QCTLSBSD to QCTL (or equivalent).
Although multithread subsystem jobs alleviate the requirement to set up and manage multiple subsystems for interactive users, you may still want to consider doing this, although for different reasons. Multiple subsystems can make it easier for you to manage the users on your system and offer additional options for performance tuning.
So why write this blog about something so old? I had a coworker ask me just last week if we'd yet removed the restriction of 250-300 devices per subsystem. That took me by surprise, but since this was an internal design change that did not affect externals, we never documented this. We just removed that old recommendation from the Information Center. The only place I know where there was external documentation about this change was a small update in a support technote that described the circumvention stating that the recommendation is no longer needed for releases 5.4 and higher.
Finally, I'll share a bit of my history with subsystems and device-recovery processing. In 1995 or so, I was asked to investigate the SNA error-recovery processing in general. It was during this job assignment I learned of the tight integration between SNA communications and IBM i work management, and where I discovered the lack of scalability in subsystem jobs due to their single-threaded design. At that time we didn’t have support for threads; they were added to the operating system in the V4R2 release. We looked at various ways to address this scalability limit, but all of the ideas at the time were too expensive and risky to implement. So we published the recommended limit of 250-300 devices per subsystem as the circumvention and lived with the issue. In 2002, my work assignment changed and I joined the work-management team. While in that job, I had the opportunity to focus on our subsystem architecture. By this time, threads were old news and the recommendation of changing our subsystem architecture to be multithreaded was accepted and delivered in the 5.4 release. It was an amazingly talented group of software engineers that worked on this project and the quality of their work was outstanding!




Good information, thanks for sharing. What was the last release that the work management manuals where updated for page fault ranges and recommended memory settings?
Posted by: Travis Frink | September 29, 2010 at 11:47 AM
Hi Travis,
We tend to not document specific performance recommendations any more because there is such a wide variety of applications and workloads that the answer is always "it depends".
General performance tuning information is now in the Information Center under the Performance section (Systems Management -> Performance -> Managing System Performance -> Tuning Performance).
We published a Redbook in November of 2009 that included information on tuning as well as some basic guidelines. The name of this Redbook is "End to End Performance Management on IBM i" and can be found here:
http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247808.html?Open
Posted by: Dawn | September 30, 2010 at 08:22 AM
Hi May, interesting the large number of devices in a single sbs, what about the opposite? for example to have 300 active sbs. I have an ISV application that uses a SBSD for a "logical" group of users and now, in a process of consolidation, these groups can be 300. Do you know situations where hundreds of sbs are used succesfully?
Posted by: Stefano Martinelli | January 17, 2011 at 04:42 AM
Hi Stefano,
As documented by the Maximum capacities section in the Information Center under the work management limits, http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/topic/rzamp/rzampwrkmgmt.htm, the maximum number of active subsystems the system supports is 32,767. Since a subsystem job is just another job, it counts toward the total maximum number of jobs in the system as well. Using subsystems as a way to consolidate workloads on to a single partition is common.
However, as the number of active subsystems increases, there are some operations that may take longer to run as the system has to inspect information about every subsystem. For example, starting and ending subsystems.
Of course, if you start or end a lot of subsystems all at the same time, you can consume a lot of system resources.
Posted by: Dawn | January 19, 2011 at 03:21 PM