For a variety of reasons, we haven't been that heavily involved with system APIs in the last year or three, but that has all changed in the last couple of months. In the process we've been reminded of one 15-year-old lesson, and learned a couple of new ones along the way.
It all started when we were asked to help debug some programs that made extensive use of APIs to retrieve lists of objects, etc., as some objects seemed to be missing from the reports. But no error messages were being issued by the programs. In fact they were quite content that all was right with the world.
Our first thought was that at some point the programmer had simply failed to test the bytes-used API error indication. So we started working our way through the code looking for any such situations. No sign--every API call seemed to be covered either by having the bytes-available set to zero so that an escape message would be triggered, or by specific coding that appeared to correctly test the number of error bytes returned. Hmmm.
Further investigation revealed the programs were actually being run on a system that had a large numbers of objects of all flavors and varieties. In fact when we did the math it appeared quite possible that the information being retrieved for some object types could possibly exceed the 16 Mb limit on the User Space. But if indeed that was the case then why was no error flagged or message issued? Susan recalled having discussed with Bruce Vining the fact that some of the older list APIs don't consider "filling up" the User Space as an error as such and on encountering such a situation set a flag in the header to indicate that the data wasn't complete. In fact three possible values can be set: "C" for Complete, "P" for partial (meaning that the data is accurate but the API probably ran out of space) and "I" for Incomplete (meaning that the data cannot be trusted). Was that flag being tested by these programs? No, it wasn't. In fact the flag in question hadn't even been defined in the programs. The programmer had clearly worked on the basis that filling up the user space would be considered an error and therefore would be indicated via the "bytes used" value in the API-error feedback structure.
So it was at least a possibility that the User Space had filled, but surely there would have been some indication in the log? A more detailed examination of the logs revealed that an escape message had indeed been issued indicating that the User Space was full. But it hadn't stopped the program. Back to the RPG code again and this time we spotted it. A problem we have known about for more than 15 years had come back to bite us and we had simply missed it. What was this ancient troublemaker? The old RPG/400 habit of coding an error indicator on a CALL statement and then ignoring it. The programs in question, while written in the last few years, had a decidedly RPG/400 feel to them. In fact the code even had a few left-hand conditioning indicators here and there, so we should have been more wary. In our defense we've been coding /Free indicator-less RPG IV for so long now that we can only assume that our brains just didn't even register the use of the indicator beyond a simple "oh good grief not more numbered indicators" type of feeling anyway.
For those of you who have been lucky enough to never witness this problem, a few words of explanation might be in order. One of the few operational differences between RPG IV and RPG/400 is in regard to how errors are percolated. In ILE RPG an indicator coded on a CALL is taken as a statement by the programmer that they intend to deal with the error and so the error is considered to have been handled. In RPG/400 the error would still have resulted in the green-screen-of-death, but coding an indicator prevented the error from being percolated up the call stack. In other words, programmers would often code the indicator not because they intended to do anything with it, but simply to prevent the error from causing a halt in every program all the way back up the call stack. The result is that if this RPG/400 "style" is used in an ILE RPG program, it can cause severe errors to logged, but otherwise completely ignored--as they were in this case.
We've learned a number of lessons from this little exercise.
- The size of our systems and the number of objects they contain is growing daily, particularly as systems are consolidated. Some of our assumptions about the size of results returned form list APIs may no longer be valid.
- Just because a program is written in RPG IV and compiled as an ILE RPG program doesn't mean it always uses RPG IV programming techniques. Be cautious if you spot the signs of a program having been written in RPG/400 style. There may be some old bad habits lurking there--and we don't just mean GOTOs!
- We need to make a greater effort to keep up-to-date with advances and changes in the API set. Even if we don't think we'll need them today. IBM adds these extra flags, continuation handles and similar features for a reason. Sometimes IBM adds APIs that appear to provide similar functionality to existing ones. In the particular case we encountered, the existing calls to QUSLOBJ should almost certainly be replaced by the more recent QGYOLOBJ API. This is one of a new breed of APIs that builds the list "under the covers" and then allows you to read the entries as you wish. Because no User Space is used they don't suffer the size limits of the old APIs. They are also more flexible than those APIs such as QSYLOBJP that still use the User Space approach but provide a link (or handle) to allow the results to be continued across multiple User Spaces.
- Even in situations where there appears little chance of existing limits ever being reached, all available status flags etc., should still be tested if only to ensure that the application halts in a meaningful manner should an error ever occur.
We've said before that one of the strengths of our beloved IBM i is that it can still run programs written 20+ years ago. We've also noted that it is also one of the weaknesses of the system. In this case because it can lead us to delude ourselves that things will remain the same. They won't. It also led the original programmer to assume that yesterday's programming techniques were still valid today simply because they could actually be coded and appear to run. That's OK if the original technique was a good idea, but in this case it was one of those "isn't that cool" type of programming tricks that tend to be perpetuated without any great thought as to whether they were ever a good idea. This one wasn't.