In VIO server environments, automatic failover is set up with shared Ethernet adapters on VIO servers. Though an effective solution, if the control channel isn't properly configured, problems can result. Another drawback to this method is that, with ever-increasing adapter speeds, it feels wasteful to have one or more 10-GB network adapters just sitting idle until a VIOS fails.
Steve's recommendation for better utilizing network adapters is actually spelled out in this document, "Using Virtual Switches in PowerVM to Drive Maximum Value of 10Gb Ethernet."
The authors, Glenn E. Miller and Kris Speetjens, recommend an alternative to automatic failover. They suggest enabling both VIO servers to be active at the same time, and using network interface backup (NIB) at the VIO client level. This way the administrator can manually choose which LPAR uses which
VIO server, and load balance that way. In the process, we end up using all the network adapters that we paid for, which is a good thing.
From the document:
"Something that we haven't pointed out thus far in the discussion is the fact that redundancy does have its drawbacks. The backup adapter is fundamentally unused unless a failure occurs. In the example depicted in Figure 2, there are three physical adapters and their corresponding Ethernet switch ports that are never used except when a failure condition occurs. These ports have associated costs. Within the more common 1-GB environment, it's not too drastic. However, in the 10-GB environment it's vastly different. One customer estimated that it cost them $16,000 for each 10 Gb/s connection provided in their data center, taking into account the cost of the Ethernet adapter, cabling and the proportionate cost of the chassis, blade and port of the Ethernet switch. Obviously, 10 GB connectivity is going to be a necessity in the near future as customers continue to consolidate more and more workloads onto smaller, much more powerful systems. However, it it may be difficult to justify 40 GB worth of bandwidth when only 10 GB will be utilized.
"A significant benefit to this design is that both VIO servers can be active at the same time. Of course, each individual client LPAR is only using one, but half of the clients could be configured to use VIO Server 1 and the other half to use VIO Server 2 as their primary paths. Each client would failover to its respective secondary path in the case that its primary path was lost. So the customer's investment in hardware is more effectively utilized.
"Protection against this scenario is accomplished by configuring two VIO servers on each Power Systems frame and assigning resources to the VIO clients from both VIO servers. The 'classic' design that allows use of VLAN tagging (Figure 2) uses a control channel to allow the VIO servers to detect a failure and handle Ethernet traffic accordingly. The vSwitch design handles this at the client level by pinging external resources and failing over the Client NIB Etherchannel when a threshold of failed pings is reached.
"The classic design's advantages are that it requires no configuration at the VIO client level and all clients can be migrated from one VIO server to another with the execution of one command on the VIO server during system maintenance. The disadvantages of the classic design is that only one VIO server is carrying Ethernet traffic at any time which means a systems is only utilizing 50 percent of its available bandwidth at any time. It also means that there is no way to test if the failover link is correct without failing over every VIO client on a frame. The vSwitch design's advantages are that it allows both VIO servers to carry Ethernet traffic at the same time. This means that administrators are given more granular control over moving Ethernet traffic from one VIO server to another as well as utilizing a higher percentage of bandwidth during normal operations. The disadvantage of the vSwitch design is that it requires every VIO client (which uses Network Interface Backup to verify path integrity) to ping an address outside of the frame to test for failures."
The document details the pros and cons of this option, as well as explaining how to set it up. It's well worth reading in its entirety.
So do you see any reasons not to implement this in your environment?