The Three Rs: Resiliency, Redundancy and Reliability
The reliability of IBM i on POWER processors has always been well documented and is a key reason why so many companies choose to run their business on this operating system. You know the stories--whenever there is a natural disaster, a picture or a story is passed around showing a poor little AS/400 or Power System, battered and bruised, dropped or drowned, but still running. As our hardware and operating system have changed over the years, reliability has remained a constant. It remains the system which can "take a licking, but keep on ticking."
We build reliability into our platform from the bottom up. Redundancy and resiliency are cornerstones of the hardware. The components of the hardware, whether manufactured by IBM or purchased elsewhere, have high reliability designed right in. The goals when designing the system have always been the same: minimize the number of hardware errors, increase the redundancy available with all components of the system, and provide a mechanism to allow quick response to any hardware issues which do occur. As we’ve moved from the AS/400 to the POWER7 processor and beyond, the fault tolerance has improved dramatically. We’ve increased the redundancy in everything from processor and memory to the IO subsystem, and we’ve added function such as concurrent maintenance capability. The sum of all of these capabilities is a system which rarely fails. If a component does fail, in many cases the hardware can mask the error, call home for service, and allow concurrent repair, resulting in little to no outage time from a user perspective.
The software is as reliable as the hardware. As we celebrate this 25-year milestone, there are stories being shared about systems hidden away, running the company with no maintenance or upgrades, and sometimes no one even knowing the system exists. I would not recommend this, but it certainly occurs! I heard a funny story recently about someone trying to track down their system, following cords which led to a solid wall. It turned out that the system was in a closet which had been walled over years before, still running the company. This speaks to the dependability of the software as well as the hardware.
As a member of the IBM i development team, I can speak personally to the focus on reliability and error recovery in our software development cycle. During the design phase of any new feature or enhancement to an existing one, the design is reviewed for serviceability, error recovery, security and integrity. Our code is reviewed by peers to ensure we have not introduced errors. We have several test phases to ensure we’ve shaken out any errors. On top of that work, when an error is discovered by our customers, we do causal analysis to determine how our internal processes can be improved to prevent future errors. We as developers may groan as we fill out the checklists required to ship function, but it results in a very stable software base in the field.
I’m the architect for our IBM i high availability product, called PowerHA SystemMirror for i. Selling high availability on IBM i isn’t always the easiest task when the system is known to be so stable and dependable. However, due to the growing need for businesses to be online 24-7 and the threat of natural disasters and other disruptive events, high availability is a necessity. Our PowerHA product provides low maintenance, cost effective replication. Commonly, as you add moving parts, you increase the chance for failures. By taking advantage of tried and tested operating system and hardware technologies, we have a high availability solution which is as solid as the system it’s built on. The PowerHA product was designed to be integrated into the operating system and the hardware, taking advantage of that stable architecture and building on it. We also have an excellent Lab Services team, as well as business partners, who can help implement and customize the solution to meet your specific high availability requirements. The result is a solution which is ready to switch when it is needed.
These days, life is ever changing and moves more quickly than ever before. You want to spend your time and dollar resources on generating revenue and not on system maintenance. Running your workloads on IBM i and Power hardware has always given our customers that advantage, and we’ll continue our focus on reliability into the future.