This information has been circulating for awhile, and Anthony English covers the topic here and here. But I want to make sure HMC users are aware of this important update and the need to make sure you have the fix loaded if you're at V7R7.3.0.
A problem is known to exist when using dual HMCs in one of two environments: either one HMC is at a different level than the other, or both HMCs are at the base HMC V7R7.3.0 level without fixes.
The problem is possible exposure to corruption that could cause you to lose partition profiles.
A fix is available and should be installed immediately on any HMC that might possibly be impacted by this problem.
If you're using an HMC and an SDMC, be sure to get the fix for the SDMC as well.
From the IBM technical bulletin:
"This PTF was released July 18, 2011, to correct an issue that may result in partition configuration and partition activation profiles becoming unusable. This is more likely to occur on HMCs that are managing multiple systems. A symptom of this problem is the system may display Recovery and some or all profiles for partitions will disappear. If you are already running HMC V7R7.3.x, IBM strongly recommends installing PTF MH01263 to avoid this issue. If you are planning to upgrade your HMC to the V7R7.3.x code level, IBM strongly recommends that you install PTF MH01263 during the same maintenance window to avoid this issue."
The efix can be found here. This package includes these fixes:
- Fixed a problem where managed systems lose profiles and profiles get corrupted resulting in Recovery state which prevent the ability to do DLPAR/LPM.
- Fixed a security vulnerability with the HMC help content.
As noted, this is the statement IBM released in July, before the fix became available. The fix--MH1263 PTF--is now out, so be sure to install it.
Again, from IBM:
"Abstract: HMC / SDMC Save Corruption Exposure
Systems Affected: All 7042s
Communicable to Clients: Yes
IBM has learned that HMCs running V7R7.3.0 or SDMC running V6R7.3.0 could potentially be exposed to save area corruption (where partition profile data is stored).
"Symptoms include loss of profiles and/or recovery state due to a checksum failure against the profiles in the save area. In addition, shared processor pools names can be affected (processor pool number and configuration are not lost), system profiles lost, virtual ethernet MAC address base may change causing next partition activation to fail or to have different virtual Ethernet MAC addresses, loss of a default profile for all or some of the partitions.
"Partitions will continue to run, but reactivation via profile will fail if the profile is missing or corrupted. All mobility operations and some DLPAR operations will fail if a partition has missing or corrupted profiles.
"Environments using HMCs or SDMCs to control multiple managed systems have the greatest exposure. Triggers for exposure include any of the following operations performed in parallel to any managed system: Live Partition Mobility (LPM), Dynamic LPAR (DLPAR), profile changes, partition activation, rebuild of the managed system, rebooting with multiple servers attached, disconnecting or reconnecting a server, hibernate or resume, or establishing a new RMC connection.
"Recommended Service Actions:
There is no real work-around other than limiting the configurations to a single HMC managing a single managed system.
"Customers who have not yet upgraded or installed HMC 7.7.3 should delay the upgrade/install if at all possible until a fix is available.
"Customers who have not yet installed and deployed SDMC 188.8.131.52 should avoid discovering production servers until a fix is available.
"Customers that have 7.7.3 or SDMC 184.108.40.206 deployed should:
- Immediately do a profile backup operation for all managed servers:
bkprofdata -m <managed system name> -f <filename>
- Minimize the risk of encountering the problem by using only a single HMC or SDMC to manage a single server via the following options:
- Power off dual HMC/SDMC or remove the connection from any dual HMC/SDMC.
- Use one HMC per server (remove/add connections as needed if necessary).
- A single HMC/SDMC managing multiple servers might be done relatively safely if the operations listed under triggers above are NOT done to two different servers concurrently.
NOTE: Recovery will be easiest with a valid backup of the profile data. So it is extremely important to backup profile data prior to an HMC upgrade or after any configuration changes to the save area. If a profile data backup exists this problem can be rectified by restoring using:
rstprofdata -m <managedsysname> -l 3 -f <backupfilename>
"In addition to user backups, profile backups can be extracted from the previous save upgrade data (DVD or disk); a backup console data (if available); or pedbg.
"If a good backup does not exist, call your HMC/SDMC support to determine if recovery is possible.
A fix to prevent this from occurring is due out by the end of July (Editor's note: We realize this is now available but wanted to include the verbiage for completeness), but the PTF will not fix an already corrupted save area. A follow-up notification will be sent as soon as it is available.”
Please heed the warnings and load this fix as soon as possible if you're running V7R7.3.0. And don't run any HMCs at V7R7.3.0 while running others at a lower level.