Organizations which depend on a secure, up-and-running database for their very survival simply cannot afford the risk of either a hardware or software failure. IS managers now demand systems that provide near-instantaneous replication, redundant hardware backup, recovery protection and failure prevention down to the component level. When a single failure can result in hundreds of thousands of dollars in lost transactions--not to mention severe damage to an IS management career--it's no wonder that fault tolerance is of growing interest to the database management community. To meet this need, companies such as Tandem, Stratus, Sequoia and others now offer machines that literally will not fail. Virtually everything about them, from CPUs to disk drives to basic memory, is redundant and hot-swappable. If a component fails, you simply walk around to the cabinet and slide in a ready-to-run replacement. That's a great comfort if you are running true mission-critical applications, but that peace of mind comes at a considerable price.

Even a low-end Tandem box running UNIX can cost upward of $100,000, and high-end fault tolerant systems can run millions of dollars. Those are daunting investments for many organizations.

The question is: Can you achieve anywhere near this level of fault tolerant reliability without spending your entire IS capital budget? The answer, as we are now demonstrating on a proprietary system we call "high availability architecture (HAA)," is a resounding and positive "yes." Here's how a mix of affordable hardware and innovative software can be configured to deliver near fault tolerance on a non-fault tolerance budget.

Our company developed a high availability architecture database system for several health care providers. This is a transaction processing system in which a request enters the system, is processed and a response is returned to the originator. The integrity of these records and the near-constant availability of the database are critical.

To achieve near fault tolerance, we assembled a configuration consisting of two high-end Suns (our primary and backup machines) running Informix, a redundant array of inexpensive disks (RAID) attached to the primary database machine and a router to handle incoming data calls. Informix runs on the primary and backup database machines, and we employ an Informix enterprise replication technique to continuously replicate to the backup machine.

This dual architecture, with primary and backup machines for database and com servers and BEA's Tuxedo monitoring the entire system, allows us to achieve a balanced and highly reliable mix of hardware and software fault protection. Because the backup database is a read-only system, it is used to increase the efficiency of system reporting and to provide application services that only require read permissions. This affordable solution delivers dependable protection against virtually any kind of hardware or facility failure.

If, for example, the primary database machine should fail, Tuxedo recognizes the problem and the system will failover to the backup machine, establish the necessary connections and route all transactions to this machine. The RAID arrays connected to the primary database machine protect against a disk crash--by far the most likely source of failure in any system. Our router accepts incoming data via a leased line and has automatic failover to a backup ISDN dial-in pathway in the event that we lose our primary connection.

Clean power runs to all machines, and redundant air conditioning systems ensure proper temperature for all hardware elements. We will eventually recommend a geographic separation of the primary and backup machines--creating a "back hoe" defense against the possibility that a work crew outside the building cuts all connections to the outside world.

This flexible and very balanced system delivers "near fault tolerance" at a fraction of the cost of a traditional fault tolerant configuration. It is highly scalable and allows us to easily add machines and other components as the data volume increases. We plan to add an NT machine to this HAA, which will enable us to utilize the Informix Continuous Data Replication feature allowing for read/write to multiple databases. This capability and Tuxedo's ability to do data-dependent routing will enable a much more sophisticated failover algorithm to be utilized. Informix's long-term plans are to extend their XPS model into an OLTP environment. Once this is in place, utilizing replication (other than for geographic redundancy) will not be required as the ability to do OLTP across multiple machines will fit perfectly with the high availability architecture described here.

Considering these powerful capabilities, you could make the case that this configuration is, in fact, better than fault tolerance, because even the most expensive fault tolerant systems remain vulnerable to the programming flaws of the common homegrown business application. In applications that can accept "near fault tolerant" availability, this approach offers an alternative that is proven, secure and very affordable.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access