To satisfy the incredibly complex demands of decision support and OLTP applications, corporate databases in retail, finance and similar industries are now quickly growing beyond the terabyte frontier. The increasing size of what we call very large databases (VLDBs) has driven the search for a fundamental architecture that is at once open, affordable and scalable.
Database software developers recognized some time ago the need to create systems that could capture, store and retrieve massive volumes of transactional data. At least two primary schools of thought have now emerged--each offering a very distinct and competitive solution to the challenge of the VLDB.
Informix first unveiled a decision support technology with the introduction of the Dynamic Scalable Architecture (DSA) engine in the 7.x version of their database system. DSA was designed to leverage the power of symmetrical multi-processing (SMP) architectures. To gain the maximum benefits from this architecture, Informix developed the industry's first true Parallel Data Query (PDQ) strategy. This innovative solution applied a cost-based optimizer to analyze each SQL statement, divide the SQL statement into parts that could be run in parallel and run each of those parts simultaneously on a separate thread.
With the continued growth in database volumes, however, Informix and others in the industry quickly recognized the inherent limitations in the SMP model. Even the most efficient SMP machine, because of the system bus and operating system, has a limit to the number of CPUs and quantity of memory and disk that can be added. These limitations place a ceiling on the scalability of the SMP- based machine.
To provide effective decision support in very large database environments, and to lay a solid foundation for the now-emerging OLTP technology standards, Informix took its highly reliable DSA concept to a faster, more powerful and dramatically more scalable level. With the introduction of the on-line Extended Parallel Server (XPS) architecture, Informix has pioneered a "parallel everything" database pathway. XPS offers a very distinct alternative to the Oracle DSS model--one that delivers clear and immediate advantages in terms of speed and the efficiency of data flow and promises virtually unlimited OLTP scalability.
Oracle and Informix have advanced similar solutions at the DSA level, with each utilizing symmetric multiple processors within a single machine unit. But when data volumes and user requirements call for a multiple machine environment, the companies offer two radically different approaches. For IS managers seeking more effective DSS performance and for those planning an upward migration toward OLTP, it is important to understand the differences, limitations and relative advantages of these competing visions.
A VLDB Bottleneck
While Informix has adopted a parallel-everything, shared-nothing approach, Oracle has pursued a "clustering" strategy in which databases spread across multiple machines share, in effect, a cluster of disk drives. Under this arrangement, because the engine itself is not parallelizing across these multiple machines, each machine runs the complete query and then loads the results into its shared memory. This model allows for the addition of a large numbers of users, but because every machine in the cluster shares the same disk drives, disk drive access has emerged as a key and often limiting factor in the Oracle model. In addition, when this model is employed in an OLTP environment, the disadvantages become even more apparent. In the cluster model, each machine has its own database engine that can perform any operation to any part of the database. Because of this, it is necessary to coordinate actions between machines. For example if user "1" on machine "A" updates the inventory table, then user "2" on machine "B," who is also working on inventory, needs to know that the change has occurred. Oracle employs a type of "distributed lock manager" which, in effect, locks down the disk page affected by any modification. When a user forwards an update or query, this distributed lock manager coordinates the search of shared memory for appropriate changes and, when necessary, re-reads the corresponding disk page.
Unlimited Parallelism
In contrast, the Informix DSS model adopts a more modular and thus infinitely more scalable parallel architecture. This parallel-everything arrangement enables designers to link multiple SMP boxes in a seamlessly expandable network, with each machine having direct and efficient access to its own disk. Since each machine is responsible for managing a subset of the entire database, based on whatever fragmentation strategy is utilized, there is rarely any work that is duplicated on multiple machines as is frequently the case in the clustering model. This loosely coupled SMP loop gives us the advantages of parallel processing on each machine as well as parallelism across multiple machines.
Thus, with the Informix XPS "shared nothing" model, we enjoy the leveraged efficiency of a massively parallel processing (MPP) environment. In this MPP world, each server runs its own "instance" of the Informix database, each has its own set of disks and each houses OnLine XPS services for logging, recovery, locking and buffer management. Unlike the clustered approach, each machine in the network is responsible for its own disk and data. A fragmentation scheme is utilized to spread the data across each machine in the network. To the user and to the system administrators, this network of linked SMP units looks and operates like a single large OnLine XPS server. This one-system view greatly simplifies database management and allows administrators to control the entire network from a single console.
The clean modularity of this model also provides a scalability that is as yet unmatched by other VLDB technologies. As user numbers and data volumes grow in any given organization, IS managers can simply add more SMP machines to meet the increased demand. The system's unique "shared nothing" architecture eliminates the natural bottlenecks of a shared-resource approach and can deliver superior speed, performance and reliability in even the largest and most transaction-active applications.
This construction, of course, is suited specifically for situations in which one must search millions or even billions or rows of data for DSS or OLTP applications. And to derive the maximum possible benefits from this parallel architecture, it is necessary to devise fairly sophisticated strategies for fragmenting, storing and retrieving the data stored across these multiple parallel SMP machines.
Data fragmentation is a rich topic of discussion in and of itself and one which we have addressed in past DM Review Informix Edge columns. A sound fragmentation strategy apportions data among units of this parallel distributed architecture in a way that makes the information easy to catalog, store and retrieve. When a user query hits the primary SMP machine, it consults the updated catalog and then forwards a message to the appropriate machines to execute the needed action in the most efficient possible manner.
The powerful efficiency of this parallel-everything processing model delivers unmatched performance for decision support and emerging OLTP applications. Furthermore, by designing these linked SMP boxes with "dotted line" responsibility for the disk drives on adjacent boxes, the network can be configured to provide redundant backup protection in the event that any single box should go down. Additional network integrity can be assured by creating a "mirror" backup of the primary catalog at a secondary location.
Powerful, secure and easily scalable. Those are the clear advantages offered by the shared-nothing Informix OnLine XPS approach. As databases continue to grow in size and complexity and as organizations demand faster and more relevant OLTP and decision support performance, more and more companies now recognize the inherent benefits of the emerging MPP architecture.
J.D. Hicks co-founded Virtual Solutions, a Metamor Worldwide, Inc. company, in 1993. As chief technology officer for Virtual Solutions, Hicks is responsible for directing the research and development of the company's concepts, specifications and applications. For more than a decade, he has been involved in the design, development and support of UNIX database systems. He is knowledgeable about leading data- structures, database design methodologies and the practical use of popular file handlers.










Be the first to comment on this post using the section below.