The following PowerUp blog entry was written by Chandler Wilkerson, one of Rice University’s Linux cluster administrators in the Research Computing Support Group, where he administers the IBM POWER7 Blue BioU cluster and the university's campus Linux deployment infrastructure.
When last I wrote, I detailed the user education challenges in optimizing the running of HPC codes on our small POWER7 cluster, Blue BioU. Quite a bit has changed from a year ago. This May, we completed an expansion of the Blue BioU cluster by 30 nodes, increasing the compute node count from 18 to 48. At the same time, we upgraded the operating system from Red Hat Enterprise Linux (RHEL) version 5 to version 6, which gave us huge gains in support for the POWER7 hardware.
The most notable feature has been a recently incorporated set of patches to the upstream Linux kernel (and incorporated into RHEL6 kernels) that added asymmetric SMT packing for multi-threaded Power CPUs. This has greatly simplified our handling of basic codes by improving default performance characteristics of Linux applications on POWER7 hardware.
Before this patch, application processes would be randomly distributed across threads, sometimes putting multiple execution threads on a single core, while leaving other cores empty. This would lead to throttling of the overloaded cores, not to mention thread contention, all of which combined to provide horrible performance in HPC codes where synchronization ensures that the whole system cannot run any faster than the slowest process.
The asymmetric scheduler now knows to migrate CPU intensive tasks to the fastest threads on individual cores and to spread out the load over all the cores. We still recommend using CPU pinning techniques to achieve optimal performance, but now the performance boost is closer to 5 to 10 percent instead of multiple orders of magnitude.
RHEL6 makes another important change in that the entire OS is now compiled in 64-bit by default, as opposed to a 64-bit kernel with a mostly 32-bit OS stack. Perhaps more importantly, GCC now outputs 64-bit code by default, so we don’t get nearly as many library architecture conflicts now as we had when everything defaulted to 32-bit.
Before the upgrade, even moderately experienced users would need special attention to get their codes running reasonably on Blue BioU. Post upgrade, compiling and running jobs on the cluster is much simpler. The proof is in higher, more efficient utilization statistics and reduced problem ticket load for us admins.