September 09, 2014

Locating a Problematic Filesystem

It was an ordinary day. I needed to take a mksysb. Only this time, I was getting an error.


            /usr/bin/mkszfile[1266]: FS_MIN_LOG = FS_MIN_LOG *

            20480 : 0403-009 The specified number is not valid for this command.


            0512-008 mksysb: The mkszfile command failed. Backup canceled.


I checked ps –ef | grep mkszfile and saw that it was still trying to run, but it wasn't doing anything. I went ahead and killed the process.

The error message didn't tell me much, but fortunately a quick web search yielded a few different ideas and suggestions, including this and this. Then I found an entry from this blog (that advertises "Unix tips, food reviews and astronomy"):

"A google search revealed it was probably a bad FS causing the problem. To identify which one(s), I ran the following: sh -x /usr/bin/mkszfile

"This gave the full output and I could see which file system it was processing when it crashed. I then unmounted the file system, [ran fsck] and remounted it before re-running the mkszfile.

"In this case there were four file systems it complained about and after [running fsck] them the mkszfile ran through ok. A re-run of the mksysb then worked ok."

That seemed simple enough, so I gave it a try. It went exactly as described in the blog post. After running the command, the filesystem that was causing me issues was the last one that was processed before the error occurred. Luckily for me that filesystem wasn't being used at the time, so I just unmounted it, ran fsck –y /filesystem and then remounted it. Then the mksysb worked as expected.

Now when the next person does a web search on this error code, there will be two sources confirming that running sh –x /usr/bin/mkszfile is the way to locate the filesystem that's causing you problems.

September 02, 2014


The IBM Redbook, "PowerVM Best Practices," has a detailed look at mixing vSCSI and NPIV on VIO client LPARs.

From Section 5.1.3:

"It is possible to mix a virtual Small Computer System Interface (SCSI) and N-Port ID Virtualization (NPIV) within the same virtual I/O client. You can have rootvg or booting devices that are mapped via virtual SCSI adapters, and data volumes that are mapped via NPIV.

"Mixing NPIV and a virtual SCSI has advantages and disadvantages, as shown in Table 5-1.


* It makes multipathing software updates easier for data disks.

* You can use the Virtual I/O Server to perform problem determination when virtual I/O clients have booting issues.


* Requires extra management at the Virtual I/O Server level.

* Live Partition Mobility (LPM) is easier with NPIV."

What's your preference? Do you want your SAN guys to provide all your LUNs via NPIV and manage the same multipath drivers on the client for both rootvg and datavg? Or would you rather manage your rootvg multipath drivers on your VIO server, map up the rootvg disks to the clients via vSCSI and use NPIV for your data LUNs?

I prefer to use vSCSI for rootvg. I want to boot my VIO server from my internal disks, map some LUNs to my VIO server to use for my client LPARs rootvg, and then map my data disks via NPIV to my client LPARs. This allows me to troubleshoot by booting my VIO servers locally, and boot my LPARs "locally" via vSCSI.

When I need to update multipath software on the client LPARs, I'm not dealing with a chicken-and-egg dilemma where I'm booting my machine using the same multipath software I now need to update.

When I need to update my client rootvg multipath software, I'm updating my VIO server, which also booted locally. At no time am I "changing the tire while the car is speeding down the road," as might be necessary if I updated drivers when booting my client using NPIV.

Yes, doing it this way requires more effort compared to simply having your SAN team map everything to your clients. In the end though, I believe the benefits outweigh the burdens.

If you disagree, feel free to make your case for NPIV in comments. I'll also accept input from anyone who wants to back me up on vSCSI.

August 26, 2014

Useful Storage Links

Here's an assortment of really good storage-related articles -- the majority of which are found on IBM developerWorks -- that are worth your time. While some of them are a few years old, they still provide relevant information.

* "Guide to selecting a multipathing path control module for AIX or VIOS."

* "Using the AIX Logical Volume Manager to perform SAN storage migrations."

* "IBM AIX SAN Volume Controller update and migration."

* "IBM AIX MPIO: Best practices and considerations."

* "Tracing IBM AIX hdisks back to IBM System Storage SAN Volume Controller (SVC) volumes."

* "Shuffling disk data around."

* "AIX and VIOS Disk And Fibre Channel Adapter Queue Tuning."

* "Move data quickly between AIX LPARs using Logical Volume Manager."

* "Tip: Online migration of a file system to a smaller physical volume."

If you know of other useful storage-related articles, please cite them in comments.


August 19, 2014

More Resources for AIX Newbies

As I've noted previously, there are more newcomers to the AIX platform than you might imagine. A company may acquire an AIX system through a merger or replace an old Solaris or HP-UX box with a current IBM Power Systems model. As a result, one of their IT pros suddenly becomes the AIX guy. So, now what? How does an AIX newbie get up to speed with virtualization and AIX?

 I've mentioned the QuickSheets and QuickStarts from William Favorite. I've also highlighted conferences, classes and free monthly user group meetings that you can look into. Recently though, I was pointed to this old IBM web page featuring various AIX learning resources. I call it old because some of the links no longer work, but what's still available is surprisingly useful.

Some of the material covers concepts from AIX 5.3, but even much of this information remains valid today. It's also nice that some of the links take you to current Redbook offerings and IBM training courses.

The working links cover:

* AIX security and migration (this is AIX 5.3 material)

* Virtualization introduction

* Systems Director

* Power Systems Redbooks (updated here)

* IT technical training

* IBM business partner training

* IBM professional certification

On a related note, I've always believed that the simplest thing employers can do to help their IT staff members get started with AIX or any operating system that's new to them is to invest in a small lab/sandbox machine and HMC.

I'm continually amazed to see companies spend big bucks on the latest hardware and software, but then neglect to foot the bill for additional test systems. It's great that some companies devote an LPAR or two to testing, but you can only do so much in that environment. (In addition, there can be pressure to repurpose virtual test labs into running other production workloads. Then before you know it, the production needs grow so critical that these LPARs are made offlimits to reboots and testing.)

With Windows and x86 Linux servers especially, it's relatively easy and cheap to get access to test machines. I also know of people who've purchased old Power hardware on eBay just to have something that they can run AIX on.

With actual test boxes, you can safely reboot servers, install firmware and upgrade operating systems without touching production. If you make a mistake on a test system, not only haven't you hurt anything, you've learned a valuable lesson.

How do you learn, and keep learning? How do you stay current with your skills? If your machine is happily running along and you have little need to touch it, how can you ever expect to be able to support the machine when an issue hits?

August 12, 2014

Connecting Your HMC to IBM Support

You've been asked to connect your HMC to IBM Support. The network team wants to know about the different connectivity options. They need to know which IP addresses must be opened across the firewall.

What do you do? First, read this:

 "This document describes data that is exchanged between the Hardware Management Console (HMC) and the IBM Service Delivery Center (SDC) and the methods and protocols for this exchange. This includes the configuration of Call Home (Electronic Service Agent) on the HMC for automatic hardware error reporting. All the functionality that is described herein refers to Power Systems HMC version V6.1.0 and later as well as the HMC used for the IBM Storage System DS8000.

"Outbound configurations are used to configure the HMC to connect back to IBM. The HMC uses the IBM Electronic Service Agent tool to connect to IBM for various situations including reporting problems, reporting inventory, transmitting error data, and retrieving system fixes. The types of data the HMC sends to IBM are covered in more detail in Section 4."

Included are diagrams that show different scenarios for sending data to IBM, including with/without a proxy server, using a VPN, or even using a modem (though IBM does recommend Internet connectivity). Specific options including pass through server connectivity, multi-hop VPN, and remote modem. IBM states that there are no inbound communications; all communications are outbound only.

Further, IBM explains why your machine may need to "call home":

            * To report to IBM a problem with the HMC or one of the systems it's managing.

            * To download fixes for systems managed by the HMC.

            * To report to IBM inventory and system configuration information.

            * To send extended error data for analysis by IBM.

            * To close an open problem.

            * To report heartbeat and status of monitored systems.

            * To send performance and utilization data for system I/O, network, memory, and processors.

There's also a list of the files that are sent to IBM, and the authors point out that no client data that is sent to IBM.

On that note, here's IBM's statement on data retention:

"When Electronic Service Agent on the HMC opens up a problem report for itself, or one the systems that it manages, that report will be called home to IBM. All the information in that report will be stored for up to 60 days after the problem has been closed. Problem data that is associated with that problem report will also be called home and stored. That information and any other associated packages will be stored for up to three days and then deleted automatically. Support Engineers that are actively working on a problem may offload the data for debugging purposes and then delete it when finished. Hardware inventory reports and other various performance and utilization data may be stored for many years.

"When the HMC sends data to IBM for a problem, the HMC will receive back a problem management hardware number. This number will be associated with the serviceable event that was opened. The HMC may also receive a filter table that is used to prevent duplicate problems from being reported over and over again."

Finally, there's this list of the IP addresses that need to be allowed across any firewalls. All connections use port 443 TCP:













IBM adds that when an inbound remote service connection to the HMC is active, only these ports are allowed through the firewall for TCP and UDP:

            * 22, 23, 2125, 2300 -- These ports are used for access to the HMC.

            * 9090, 9735, 9940, 30000-30009 -- These ports are used for Web-based System Manager             (POWER5).

            * 443, 8443 -- These ports are used for Web-based user interface (POWER6).

            * 80 -- This port is used for code downloads.

Take a few moments to read this document. Or, even better, send it to your network team so they can read it for themselves.