This blog is written by Bill Schmidt, the team leader of the IBM i Optimizing Translator team.
Here’s another tip for improving the performance of your ILE applications. In my last blog entry, I explained when argument optimization may speed up your programs and service programs. Today, I’d like to talk about another technique you may not be familiar with: program profiling. As with argument optimization, a full description of program profiling can be found in chapter 13 of IBM i ILE Concepts (SC41-5606-09). For simplicity, I’ll just call it “profiling” in the rest of this post.The idea behind profiling is to find the hot spots in your program–the most frequently executed procedures, and paths through those procedures. Once the hot spots are identified, the optimizing translator performs extra optimizations on those portions of your code to speed them up. Normally, the translator doesn’t know which paths are most important, so it tries to optimize all paths equally. With profiling, the translator can do a much better job.
At a high level, profiling consists of three phases:
- Data gathering
In the instrumentation phase, code is compiled with a special option, asking the translator to insert extra “profiling hooks” into your code. These hooks are small snippets of code, designed to count how often various events occur within your programs, such as how often a procedure is entered or how often a loop is executed.
In the data-gathering phase, instrumented programs are run for a while, using input data that’s representative of what you expect your programs to normally see. The hooks in the instrumented programs will maintain a table of counters that are stored away as part of the instrumented program or service program objects.
In the optimization phase, you compile your code a second time, this time with a different option that asks the translator to use the gathered profile data. The translator analyzes the counters and determines the hot spots in your code. This information is used to optimize each procedure for better performance and to package the hottest procedures together for more efficient use of computer memory. This is called applying the profile data.
When to Use Profiling
It’s important to note profiling improves the performance of your application level code. It doesn’t improve other aspects of the system that affect performance. For example, if your application spends 90 percent of its time in system calls to access the database, profiling can only improve the performance of the other 10 percent. In such an application, the benefit from profiling may be minimal. If, however, your ILE application spends at least 25 percent of its time in the application code, then you may see a noticeable improvement from profiling.
Programs and service programs consisting of many procedures with frequent procedure calls will generally see more improvement from profiling than those having only a few procedures. This is because profiling is used to improve automatic inlining of one procedure into another, and the optimization to package hot procedures together is more effective when there are many procedures. However, programs with only a few procedures can still see significant benefit from profiling.
Since profiling takes some additional effort, it should be done late in your application development cycle so it isn’t done repeatedly. Typically your application should be complete and well-tested before applying profiling to improve performance.
Instrumentation may be added when creating or changing modules, or when creating or changing service programs. Use the PRFDTA(*COL) parameter to specify that an object is to be instrumented to collect profile data. Examples:
(1) CRTCPPMOD MODULE(MYLIB/MYMOD) OPTIMIZE(40) PRFDTA(*COL)
(2) CHGPGM PGM(MYLIB/MYPGM) PRFDTA(*COL)
Note the use of optimization level 40 on the CRTCPPMOD example. You must use at least optimization level 30 (or *FULL) when instrumenting code for profile data collection.
The second example is the most common way of instrumenting code. Once your program is complete and tested, use CHGPGM or CHGSRVPGM with PRFDTA(*COL) to instrument the entire program at once.
The next step is to collect profile data by running your instrumented code. Select input data that will cause your code to use the same paths while collecting data that will be used when the application is in production. Also, be sure to run your code long enough that the most important parts of your code are exercised a significant number of times, so the frequently executed paths can be determined. Note that the instrumented code will run approximately 30 percent slower than code without instrumentation, so plan accordingly!
Before you run your code with the sample input data, run the start program profiling command:
This tells the system you want to collect profile data. If you don’t run this command, the instrumentation hooks in the code you just compiled won’t do anything. Each hook contains a fast lookup to see whether profiling is currently enabled on the system.
Why is this necessary? Consider a complex application that spends much time in start-up activities before reaching a normal steady state. If you have such an application, you may decide you only want to collect profile data after the steady state has been achieved. In such a case, you’d start your application first, and only after start-up is complete would you issue the STRPGMPRF command.
Note also that STRPGMPRF is a system-wide command. It enables profile data collection for all instrumented programs running on the system. This is important since many applications these days run in multiple cooperating tasks. It means you must be careful to coordinate profiling sessions with all users on your system.
When you’ve finished gathering enough profile data, execute this command:
This turns off profile data collection on the system. You don’t need to stop your application before entering this command.
Profile data gathering is additive. That is, if you collect data for an instrumented program once, and then collect more data for it later, the profile data counters will contain the data from both sessions. Sometimes this is what you want, but sometimes you might feel that the data you gathered wasn’t useful, and you want to start over. You can clear all the profile data counters to zero using the following command:
CHGPGM PGM(MYLIB/MYPGM) PRFDTA(*CLR)
Once you have gathered all the necessary data, it’s time to apply the profile data using CHGPGM:
CHGPGM PGM(MYLIB/MYPGM) PRFDTA(*APYALL)
This command causes the optimizing translator to read the profile data that was gathered in the previous step, and use it to optimize the code based on the frequently executed paths.
There are other keywords that can be used when applying profile data, but we recommend that you always use PRFDTA(*APYALL). You can refer to IBM i ILE Concepts if you’re curious about other options.
That’s all there is to it! Your application is now ready to run, and hopefully performs faster than ever.
The Advanced Optimization Techniques chapter of IBM i ILE Concepts contains a great deal of detailed information about Program Profiling. I highly recommend that you read it if this brief overview has whetted your interest.