This blog is written by Bill Hirsch – Manager, Systems Support for Power/AIX – University of Pittsburgh Medical Center.
When I first began configuring and administering a Power VIO environment, it quickly became clear that I could not keep up without automation. Initially, my scripts were simplistic—mapping virtual disks to LPARs en mass or correlating VIO PVIDs with LPAR PVIDs was about the extent of it. But not long after that, my colleagues and I began writing increasingly complex scripts that did everything from map virtual disks “auto-magically” to validating and alerting on SEA high availability across the enterprise.
For anyone managing a large virtualized infrastructure, the reason to automate is probably already apparent. But for those who are just dipping their feet into the water, the necessity may not be so obvious. Pre-virtualization, an adapter was just an adapter. It had a name, a few attributes, and a cable hanging out of it. But now, with PowerVM, we have adapters that are virtualized. Those physical devices have children...a lot of children… virtual children. And those children have names and attributes. But they don't always wear nametags so it's not readily apparent who their parents are. However, they do have a few unique characteristics that will allow you to pair up little Jimmy and mommy. For example, you can match an LPAR's hdisk to its parent VIO hdisks parents using PVIDs or LUN IDs. You can find a virtual Ethernet adapters’ VIO parent by matching slot IDs.
You may be thinking this is all good information but what does any of that have to do with automation. Well, I’ll answer that question with a scenario and a short test.
A production LPAR for one of your most business-critical applications is having issues. It's not terminal yet, but you've got an error log full of disk and network errors, a file system that’s gone read-only, and an application that’s no longer functioning. You don’t know what went wrong and the pressure is high to bring it all back online. In the next 10 minutes, you need to bring information to a meeting that’s been called to troubleshoot the issue. Could you answer these questions?
- Which VIO servers are hosting this LPAR?
- Which network VLAN is used by this LPAR?
- Which ent devices (adapters) make up the SEA (for this VLAN) on each VIO server in the pair?
- Which VIO server was hosting the primary channel of the Network Interface Backup (NIB)?
- Was the primary NIB channel the active NIB channel?
- How many hdisks are provisioned to this LPAR?
- What are the serial numbers of these hdisks?
- Which hdisk belongs to rootvg?
- Are there any errors for any of these disks on either VIO server?
- Do all of these disks have redundant paths to the SAN? Are any paths offline?
So how did you do? Did you have the answers to all of these questions within the allotted 10 minutes? Unless you have some automated data collection already in place, chances are you could not. So now we need to talk about fixing that problem so the next time you’re asked, you can answer, “YES.”
All of the information listed in the test is readily available if you know where to look and how to collect it. For example, running a few basic hardware management console (HMC) commands will collect a lot of the information for you. lssyscfg and lshwres will gather LPAR profile data including virtual adapter slot IDs allowing you to match those with their VIO parents and find out which SEA and its backing device your LPAR is actually using.
Running lspv, lscfg, and lsattr commands will capture every detail you need to know about disks. In short, by using a little Perl (or language of your choice) to match up this data with data from your LPAR, you can create all sorts of useful scripts and maps. Here are just a few of many examples:
- LPAR network --> VIO servers --> SEA adapter --> VLAN ID
- LPAR hdisk --> VIO hdisk --> disk serial number
- LPAR hdisk --> VIO vhost and VTD
Although the information is out there, it is not trivial to collect it manually and correlate it quickly. This is why automation must become a critical component of your virtualized infrastructure if you expect to manage it effectively. So write a few scripts that run each day to capture, store,and make these correlations while you aren't having a system issue. This way, when a problem does occur, you’ll be well equipped to solve it.
Comments