A friend living overseas recently emailed me. He was having issues with an older HACMP cluster and wanted another set of eyeballs to check it. At the time I happened to be talking with a PowerHA guru, so I invited him to take a look as well.
Our small troubleshooting group reminded me of the people who work on their cars in their driveway. At least in my formative years, the sight of someone tinkering with a car would inevitably draw curious neighbors eager to see the mechanic do his thing. In this case, the attraction was an old HACMP cluster that -- via a WebEx session -- my guru friend and I could examine from several time zones away.
I'm still amazed at the relative ease with which it is now possible to communicate with anyone, anywhere. I have family members in South Africa. Years ago they actually sent a telegram to my door because they couldn't reach me on the phone. (Not that transnational phone service was inherently unreliable in those days, but occasionally calls didn't get through.) Surprised as I was to discover that telegrams still existed, it was the best alternative for delivering time-sensitive information at that time.
Awhile back, I sent them a magicJack VOIP system so they could have a local U.S. number. This means that any time I want I can pick up the phone and make what's essentially a free phone call to the other side of the world.
Admittedly, VOIP technologies aren't yet completely reliable. My friend with the HACMP cluster experienced issues with his VOIP solution. We tried IM, but weren't satisfied waiting for each side to type out messages. Ultimately, he opted to call me on his cell phone. Of course that wasn't free, but calling internationally is much cheaper than it was even a few years ago.
As for the HACMP issue, it was fairly straightforward. A change had been made in the environment. Someone added NFS to the cluster nodes, but not to the HACMP resource groups. The admin then decided to remove NFS, but didn't remove it completely. As a result, the cluster was out of sync, and HAMP wouldn't start at the next failover:
ERROR: The nodes in resource group HA_RG are configured with more than one NFS domain. All nodes in a resource group must use the same NFS domain.
Use the command 'chnfsdom <domain name>' to set the domain name.
With this error message pointing us in the right direction, the issue was quickly resolved.
We're fortunate enough to work with some impressive technology, and that includes the older systems that continue to function effectively. But do you ever stop and really think about the amazing communication capabilities we have these days? Do you just take it for granted that these devices that fit in our pockets and purses allow us to interact in realtime with people from around the world for a relatively low cost and with very little effort?