Blog
AIXchange

Advertisement

Rob McNelly

Rob McNelly




Bookmark and Share

May 2013

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

May 14, 2013

A Big Step Forward in Storage

As a consultant I get to play with some cool, cutting-edge technologies. However, I have yet to get my hands on a half-petabyte storage array, consisting of only flash drives:

            "On the 12-hour flight from Zurich to San Francisco, the two scientists plotted out the fastest way to install and setup the two racks -- each filled with 240 terabytes of Flash provided by Texas Memory Systems (an acquisition IBM completed in October 2012), as well as 10 IBM Power 730 Express servers.

            "'This demonstration marks a tipping point for transactional workloads. It's the first time Flash storage has outperformed hard disks in all aspects, including capacity and performance density, and cost per Input/Output Operations Per Second (IOPS) and energy efficiency,' Ioannis said.

            "By the numbers, the two achieved a remarkable feat: the IBM Flash System 820 achieved more than 6 million IOPS running an IBM DB2 workload on IBM Power servers.

            "'In terms of energy our system runs on 19 kilowatts compared to 4.5 megawatts with high capacity hard disks, a 236 fold improvement,' Nikolas said."


This article points to IBM's claim that flash "can speed the response times of information gathering in servers and storage systems from milliseconds to microseconds – orders of magnitude faster. Because it contains no moving parts, the technology is also more reliable, durable and more energy efficient than spinning hard drives." According to the article, by year end IBM will open 12 "flash competency centers" worldwide for the purpose of introducing its customers to the technology.

A solution that uses less energy while providing massively superior performance? Sign me up. Seriously, I'm hoping I can visit one of those flash competency centers soon.

One more thing from this article:

            "A deal has been announced between IBM and Sprint Nexel involving the installation of nine flash storage systems in Sprint's data centre, amounting to 150TB of flash capacity. Flash is used to accelerate Sprint Nexel's phone activation application and the company is expanding its use of the technology to other parts of the data centre. Sprint has a strategy to move its most active data to all-flash storage systems."

Even on home systems, I've seen huge performance gains when going with solid-state drives (SSD) compared to hard disk drives (HDD). Although SSD costs are still higher, they seem to be dropping, and (knock on wood) I have yet to experience a failure with my drives.

Perhaps you can get your toes wet with something like this:

            "Storwize V7000 includes IBM System Storage Easy Tier, a function that responds to the presence of [SSDs] in a storage pool that also contains [HDDs]. The system automatically and non-disruptively moves frequently accessed data from HDD MDisks to SSD MDisks, thus placing such data in a faster tier of storage.

            "Easy Tier eliminates manual intervention when assigning highly active data on volumes to faster responding storage. In this dynamically tiered environment, data movement is seamless to the host application regardless of the storage tier in which the data resides. Manual controls exist so that you can change the default behavior, for example, such as turning off Easy Tier on storage pools that have both types of MDisks."

Some people use external HDD to store lots of media files, but rely on SSD for with their main system. Manually moving the larger, less frequently accessed files to another storage media is something I like to call "poor man’s tiering."

Is SSD indeed the future of storage? Is there something else I should be watching for?

May 07, 2013

Verifying Firmware

Hopefully you've seen Nigel’s post about verifying firmware before installing:

Be sure to check the comments for more information from the developers. For instance:

 

            Prevention

            Before installing Power firmware, verify through the firmware release notes/readme information that the selected level is supported on the targeted server MTM.

 

            Example of 01AL770_032_032.readme.txt:

            System firmware level 01AL770_032_032

            System Firmware Release for the General Availability of the POWER7 System p Servers 8231-E1D, 8231-E2D, 8246-L1D, 8246-L2D, 8246-L1T, 8246-L2T, 8202-E4D, 8205-E6D

 

            Recovery

            Set Boot Side to P. From ASMI:

            - Expand the "Power/Restart Control" menu.

            - Select "Power On/Off System."

            - Under the "Firmware boot side for the next boot" option, select "Permanent."

             - Click "Save settings". (NOT 'save settings and power on')

            - Reboot the system. From ASMI:

            - Expand the "System Aids" menu.

            - Select "Reset Service Processor."

             - Click "Continue."

            - Wait for the system to reconnect and show stable state in HMC GUI.

 

            Perform a Reject Fix operation. From HMC:

            - Select the applicable server.

            - Select the "Updates" menu.

            - Select "Change Licensed Internal Code for the current release."

            - Select "Advanced features."

            - Select "Reject Fix - Copy Permanent to Temporary."

            - Click "OK."

 

            After the Reject Fix is completed successfully, revert the system to the T side to enable concurrent updates. From ASMI:

            - Expand the "Power/Restart Control" menu.

            - Select "Power On/Off System."

            - Under the "Firmware boot side for the next boot" option, select "Temporary."

            - Click "Save settings."

            - Expand the "System Aids" menu.

            - Select "Reset Service Processor."

            - Click "Continue."

            Power on the server.

 

And also this comment:

 

            Abstract: REMOVING UNSUPPORTED POWER SYSTEMS FIRMWARE, SRC B1813463

            SYMPTOM: After applying an unsupported system firmware level to the temporary side of the FSP the system stops at SRC B1813463. To resolve this problem follow the steps below to remove the unsupported system firmware. Follow the instructions specific to the method used to update the code.

            IMPORTANT: Always consult firmware readme files and verify supported levels before updating or upgrading system firmware. HMC levels v7r6.3 and v7r7.2 include an update to verify the
system firmware level is supported before allowing a firmware update or upgrade to begin.

            PROBLEM ISOLATION AIDS:
            - The system may be any of the following IBM servers:

            IBM Power 710 Express Server, Type 8231, models E1C, E2B
            IBM Power 720 Express Server, Type 8202, models E4B, E4C
            IBM Power 730 Express Server, Type 8231, models E2B, E2C
            IBM Power 740 Express Server, Type 8205, models E6B, E6C
            IBM Power 750 Express Server, Type 8233, model E8B
            IBM Power 755 Express Server, Type 8236, model E8C
            IBM Power 770 Server, Type 9117, any model
            IBM Power 780 Server, Type 9179, any model
            IBM PowerLinux 7R1 server, Type 8246, models L1C, L1S
            IBM PowerLinux 7R2 Server, Type 8246, models L2C, L2S

            - This tip is not option specific.
            - This tip is not software specific.

            - The system has the symptom described above.

            FIX: User must follow the guidelines listed below to remove the unsupported code. Follow the instructions depending on the method used to update the code:

            -- HMC Managed Systems

            1) Using the ASMI, set Boot Side to Permanent.
               a) Expand the "Power/Restart Control" menu.
               b) Expand the "Power On/Off System" menu.
               c) Under the "Firmware boot side for the next boot" option, select "Permanent."
               d) Click the "Save settings" button. DO NOT click the "Save Settings and Power On" button. It will cause the server to power on running the unsupported firmware side and require that you restart the procedure.
               e) Expand the "System Service Aids" menu.
               f) Select "Reset Service Processor."
               g) Click the "Continue" button.

            Note: If this step is not completed the unsupported firmware will not be removed and SRC B1813463 will be displayed again.

            2) Using the HMC GUI, wait for the system to reconnect and show a state of "Power off."
            3) Using the HMC GUI, perform "Reject Fix -Copy Temp. to Perm."
               a) Select the applicable server.
               b) Select the "Updates" menu.
               c) Select "Change Licensed Int. Code for current release."
               d) Select "Advanced features."
               e) Select "Reject Fix - Copy Permanent to Temporary."
               f) Click the "OK" button.
            4) Wait for "Reject Fix" is completed successfully.
            5) Using the ASMI, set the Boot Side back to Temporary and reset the service processor.
               a) Expand the "Power/Restart Control" menu.
               b) Select "Power On/Off System".
               c) Under the "Firmware boot side for the next boot" option, select "Temporary."
               d) Click the "Save settings" button.
               e) Expand the "System Aids" menu.
               f) Select "Reset Service Processor."
               g) Click the "Continue" button.

            -- Stand alone systems via USB
            -- Not available for 9117-MMx and 9179-MHx servers.

            Updating firmware via USB is independent of the operating system installed. The only restriction is that the server cannot be HMC managed.

            1) Remove all system firmware present in the USB drives root directory.
            2) Download the RPM file for the latest supported firmware, then copy it into the USB drives root directory. (Note: Only one level of code should be contained in the USB root directory.)
            3) Insert the USB drive to the top port of the FSP (left side port for tower systems).
            4) Change the FSP Boot Side from Temporary to Permanent using either method [A] ASMI, OR [B] Operator (control) Panel.
               [A] Using the ASMI:
                   1) Expand the "Power/Restart Control" menu.
                   2) Expand the "Power On/Off System" menu.
                   3) Under the "Firmware boot side for the next boot" option, select "Permanent."
                   4) Click the "Save settings" button.
               [B] Using the Operator (control) Panel.
                   1) Use the Increment or Decrement buttons to select Function 02.
                   2) Press the Enter button.
                   3) Press the Enter button until the field marker moves to the right of the character "T."
                   4) Use the Increment or Decrement button to change the "T" to a "P."
                   5) Reset the FSP using either method [A] ASMI, or [B] Performing a pin-hole reset, or [C] Removing AC power.
                          [A] Using ASMI:
                             1) Expand the "System Aids" menu.
                             2) Select "Reset Service Processor."

April 30, 2013

Power Systems Best Practices

Recently I received this set of slides from Fredrik Lundholm covering best practices for Power Systems with AIX. I'll cover a few highlights, though honestly, I could discuss every slide. The information here is that valuable. So I highly recommend taking take the time to view the entire thing.

If you download his slides, be sure to look at the notes. For example on page 7 where he discusses a virtualized system design, the notes contain a couple of links relating to Entitled Software Support, including this ESS how-to guide.

Page 8 lists guidelines for capacity planning. Fredrik points out the rational starting places for your CPU and LPAR weights if no information is provided. The fact that you can make reasonable guesses without a ton of workload information just reminds me how forgiving this platform is. If things change, CPU and memory settings can be easily adjusted. Whole physical adapters can even be added or removed if necessary.

Page 9 covers firmware and using Microcode Discovery service and FLRT.

Page 11 tells you where to get fixes for the VIO server. The notes cover items that have been fixed in each release.

Page 12 covers network best practices. The notes contain a link to a step by step network configuration guide.

Page 13 shows a nice diagram of a shared Ethernet adapter load sharing configuration that is available in VIOS 2.2.1+.

Page 14 shows the recommended architecture when more than one VLAN is used.

Page 15 features a reminder about SEA and virtual Ethernet interfaces. Be sure to select large send and large receive; it's not the default setting.

 

            For all SEA interfaces, chdev -l entX -a largesend=1   (survives reboot)

            For all SEA interfaces, chdev -l entX -a large_receive=1   (survives reboot)

 

Page 17 covers storage and the need to ensure that the correct multi-path drivers are installed.

Page 18 has a nice picture illustrating how the configured machines will look.

Page 19 covers setting up fc_err_recov and dyntrk, along with setting up no_reserve and round_robin.

From page 20: To allow graceful round robin load balancing over multiple paths, set timeout_policy to fail_path for all physical hdisks in the VIO server:

            # chdev –l hdisk0 –a timeout_policy = fail_path

Page 21 has links to documentation for installing AIX. Page 22 has a nice chart illustrating good choices for running AIX. The red green and yellow color coding are intended to help you decide which TL to run.

Page 23 lists AIX tuning and values that should be changed.

Page 24 covers AIX 5.3 memory tuning.

Page 26 has a nice tip: Largesend increases virtual Ethernet throughput performance and reduces processor utilization. Starting with AIX 6.1 TL7 sp 1 and AIX 7.1 sp 1, the operating systems that supports the mtu_bypass attribute for the shared Ethernet adapter provide a persistent way to enable the largesend feature. To determine if the operating system supports the mtu_bypass attribute, run the following lsattr command [lsattr -El enX |grep by_pass]. If the mtu_bypass attribute is supported, the... command will return:

 

            mtu_bypass off Enable/Disable largesend for virtual Ethernet True

            Enable largesend on all AIX en interfaces through:

            chdev -l enX -a mtu_bypass=on

 

Page 27 shows the recommended vSCSI parameters on each client partition. Page 28 covers vSCSI Queue Depth tuning for different disk subsystems.

There is also a section on PowerHA. It's recommended that new deployments go with PowerHA 7.1. Page 31 covers I/O pacing with PowerHA.

An FAQ starts on page 32. Here's a tip I like:

            Q: How do I run nmon to collect disk service times, top process cpu consumption, etc?

            A: STG Lab services recommends the following parameters for nmon data collection:

 

            /usr/bin/nmon –M -^ –f –d –T –A –s 60 –c 1435 –m /tmp/nmonlog

 

            This will invoke nmon every minute and continue for 24 hours capturing vital disk access time data along with top processes.

 

            -d includes the Disk Service Time section in the view

            -T includes the top processes in the output and saves the command line arguments into the UARG section

            -^ includes the Fibre Channel (FC) sections

 

            On the HMC, there is an "Allow performance information collection" checkbox on the processor configuration tab. Select this checkbox on the partition that you want to collect this data. If you are using IVM... use the lssyscfg command, specifying the all_perf_collection (permission for the partition to retrieve shared processor pool utilization) parameter. Valid values for the parameter are 0, do not allow authority (the default) and 1, allow authority.

 

Starting on page 36 there are reference documents to older information, which may still be helpful for certain environments.

 

This is a fantastic set of slides with current, real world information and suggestions.

April 23, 2013

IBM i Turns 25

Though the focus of this blog is AIX, there is value in discussing the other OSs that can run on IBM Power Systems: Linux, VIOS and IBM i. With that in mind, have you seen all the information and videos about IBM i turning 25?

While I primarily find myself on AIX these days, when I started in the late 1980s I worked on AS/400 systems, the predecessors to IBM i. Part of my job involved tending to a line printer that required us to change paper and forms. The most exciting part of the job was changing from green bar paper to white, and then back again (with an occasional run of custom forms thrown in).

The AS/400 was a great platform to work on as a computer operator. And compared to other operating systems of that era, OS/400 didn't require much care and feeding. Those machines just ran.

I recall our IBM CE coming on site. He'd log in, look at logs and ask us how we were doing, but the only thing we ever really needed from him was to repair or replace the green screen displays we had connected to the AS/400 via twinax. He never had to actually do anything with the AS/400 box itself. Basically, the guy was our version of the Maytag repairman.

Of course over the past 25 years the AS/400 has gone through a few rebrandings. And over time IBM has brought IBM i and AIX together architecturally. One important thing AIX and IBM i now share in common is the capability to virtualize adapters using the VIO server. However, as AIX pros we are generally more comfortable with VIOS. Sometimes I hear IBM i folks complain about how complicated it is -- and IBM is working to make VIOS more user friendly. But this is where, as an AIX/VIOS person, you can help your IBM i friends by configuring VIOS for them. Although you can certainly dedicate your adapters and direct connect to SAN storage, VIOS allows everyone to connect to the same SAN. That's a nice advantage.

Speaking of the coming together of AIX and IBM i, you should know that COMMON, the conference that for years has centered on AS/400, iSeries, System i and IBM i technologies, continues to add more AIX content to its user group meetings. The one that took place in Austin, Texas, earlier this month had AIX courses covering application development, high availability, networking, systems management and web applications.

So did you know that IBM i is celebrating 25 years? Do you still make the mistake of calling it an AS/400?

If, like me, you worked on the AS/400 in the beginning, that's one thing. But it's neither technically correct -- nor positive for the platform -- to refer to today's IBM Power Systems running IBM i as an AS/400. While it demonstrates the loyalty that users have always had to AS/400 systems, IBM Champion Trevor Perry points out that it needs to change.  As he states: “Conflicted people called it AS/400. Confused people called it iSeries. Confident people called it IBM i.”

I think AIX users can see his point. I mean, we love our systems, but I don't know of anyone who still uses the name RS/6000. So what do you think? Does the name matter? Do you plan to step up and call it by its name, or are you going to remain conflicted and call it an AS/400?

April 16, 2013

The Search for Answers, the Need for Help

Sure, you work in the field of technology, but that doesn't automatically make you a creature of social media. So really, how plugged in are you? From Facebook to Twitter to Google+ to news.google.com to plain old email, do you often see the jokes and memes and viral videos that go around the Internet? Or are you so insulated you not only don't know that planking or the Harlem Shake fad is over, you never knew it was a thing to begin with?

Of course compared to 30 or even 20 years ago, we as a society have fewer and fewer shared experiences. Not that long ago there were four television channels (the three major networks and your local UHF station). People talked about the big TV events because everyone was watching the same things at the same times. You got to see Christmas specials once a year. The Grinch? Once. Rudolph? Once. There were no videos to rent, buy or download. Most households didn't even have remotes, much less cable television and VCRs.

These days, someone might recommend a long discontinued show (Arrested Development, Firefly, Freaks and Geeks, IT Crowd, etc.) and -- thanks to online services like Netflix or Hulu -- you might binge on the entire series over one weekend.

To be sure, the way we consume mass media is changing. Even the most-watched programs now, like the Academy Awards or major sporting events, have significantly fewer viewers than what they enjoyed a generation ago. We're at least as likely to find new music we like on YouTube or Internet radio or even in TV commercials as we are on what is now known as "terrestrial" radio.

If there's a single vehicle for shared experiences today, it might be YouTube. Consider this presentation that's generated more than 2 million views between YouTube and TED.com: It's called "The Art of Asking," and the presenter is a woman named Amanda Palmer.

I encourage you to watch the whole thing, but I'll give you some highlights. Around the 9-minute mark she talks about how she got nearly $1.2 million from her Kickstarter fundraising project, and how "crowd-funding" worked for her. She talks about how her record label considered her a failure when she sold only 25,000 recordings. But it turns out that the same number of fans and supporters, around 25,000, created a successful Kickstarter project, and ultimately helped her raise $1.2 million. Selling 25,000 recordings may make you a "failure," but getting 25,000 people to support you can make you a big success.

Around the 9:30 mark, Palmer mentions how she didn't make anyone pay for her music; she only asked them to. By asking her audience, she connected with them. And she says when you connect with people, people want to help you.

Palmer concludes by saying we need to change from "how do we make people pay for music?" to "how do we let people pay for music?"

I think this phenomenon has always been a part of our world as IT pros. Because what we do is complex, and no one person has all the answers, we rely on one another. Many people -- readers, clients, friends, what have you -- ask me for help. And I can assure you that I get help from countless people. Sure, we give each other a hard time. We joke and fool around and say just RTFM. But over the years I've developed a mental list of trusted advisors, people I know who know things. I ask, they help. They ask, I help.

Oftentimes help comes in the form of simply answering a question. In your work, when you search for an answer to a technical matter, you're exercising faith that not only that someone has found the answer, but that they've taken the time to put the correct answer out there. Many of my posts are based on real-life experiences. In this blog I attempt to share questions that were answered and things that were discovered. But you don't need a blog to help others find answers. You can always share what you know in the comments section here or in any other forums you frequent. Your thoughts, ideas and experiences may one day be the answer someone else is searching for.

When people really need assistance, don't you want to help them?