Editor's Note: This article provides an overview regarding the use of an appliance approach for data warehousing infrastructure. Although it does not compare and contrast the appliance approach with the more traditional approaches, it does provide an explanation of this emerging technology.
Fully leveraging business intelligence (BI) can make or break a company. The timely processing and retrieval of vast amounts of data is vital to the decision-making process. Equally as important as timeliness is the depth and breadth of data analysis. With the growing size of the average data warehouse, achieving these goals has become increasingly difficult; terabyte-sized data warehouses are already fairly common.
With the advent of the Internet and data tracking systems, data volumes continue to double on average every nine months. This data growth rate is twice as fast as Moore's Law. The widening gap between compute power and exponentially growing data is made worse by the increasing need for in-depth analysis as companies move from standard reporting toward interactive ad hoc BI and discovery analysis. This gap is even further exacerbated by the escalating need for near real-time results.
Vendors have thus far handled this rapid growth in database size with very expensive and consistent upgrading of hardware and software; however, over the past few years, it has become clear that existing infrastructures are unable to effectively handle the demands of in-depth analysis on large amounts of data. Plus, the economic impact of this approach has taken its toll on end users and left some organizations out in the cold with their inability to purchase, deploy and maintain a data warehouse that fits their business needs.
The goal is to find a new approach to data warehousing infrastructure that is both specific and flexible. That is, it must be suited to the task of handling vast amounts of data, yet be compatible with the customer's existing BI applications and infrastructure. Such a solution needs to be straightforward to deploy, in contrast to the highly complex systems currently available. Data warehouse appliances are expandable, affordable and uniquely suited to the ever-growing needs of users, providing orders of magnitude performance improvements in terms of speed and sophistication of data analysis.
The Current State of BI Infrastructure
The current BI infrastructure is a patchwork of hardware, software and storage that is growing ever more complex. Consider a typical patchwork BI solution:
- The database management system (DBMS) was initially architected for transaction processing, holding several hundred megabytes worth of records with a few internal users.
- The DBMS has been improved in increasingly complex layers over the years to support terabyte-sized databases, Internet-scale users and the evolving SQL definition.
- The hardware/operating system is a clustered set of generic boxes optimized for everything from mathematical queries to genome investigation.
- The system is attached to generic file systems that manage and serve data for a variety of applications.
This basic patchwork paradigm has not changed during the past 10 years. Within each of the aforementioned "silos," Moore's Law improvements have not kept pace with database volumes, complexity or performance demands. On the DBMS front, we have seen attempts at grid management placed on top of an already complex DBMS. For servers and storage, we have seen increasingly complex SMP boxes and storage area network (SAN) and network attached storage (NAS) which have been targeted for improving transactional workloads but have demonstrated only incremental benefits in the BI space. Even the advent of grid and blade architectural improvements has not addressed the demands of today's BI because they have generally been developed for transactional systems. They continue to enforce the patchwork approach with each silo being optimized for general-purpose operations, not for BI requirements.
In the absence of significant architectural improvements, the traditional answer to the growing BI problem is to continue to add more hardware. For example, a company may use an Oracle DBMS, an HP server and a storage solution from EMC. As their system grows, they may add Hitachi storage and a second server. These types of systems require data and user applications to be continuously tuned and optimized.
Terabyte-scale databases that continue to grow steadily put tremendous strain on these systems. Even in cases where the user base and data set are relatively stable, current BI systems often fail to meet their basic goal of delivering vital business information in order to make timely decisions. From an administration standpoint, this current patchwork of solutions is extremely difficult and time-consuming to manage and maintain. From the business-user point of view, it is frustrating and does not provide the agility and performance the users are looking for. These strains occur because vendors have upgraded these systems incrementally over the years rather than change the underlying architecture to address the unique requirements of today's terabyte-scale databases.
The issues with current BI architectures are evident across a broad range of companies and industries. While the system strain will become worse in the next few years, these problems exist today and are plaguing both business users and database administrators. Applied to BI, a data warehouse appliance is a machine capable of retrieving valuable decision-aiding intelligence from terabytes of data in seconds or minutes versus hours or days. Appliances represent the difference between making a decision using stale data and making one using the freshest information possible.
The Maturation of the Data Warehouse Appliance
The data warehouse appliance is designed specifically for the streaming workload of BI and is built using commodity components. It architecturally integrates hardware, DBMS and storage into one opaque device and combines the best elements of SMP and massively parallel processing (MPP) approaches into one that allows a query to be processed in the most optimized way possible. A data warehouse appliance is architected to remove all the bottlenecks to data flow so that the only remaining limit is the disk speed - a data-flow architecture where data moves at streaming speeds. Through standard interfaces, a data warehouse appliance is fully compatible with existing BI applications, tools and data. It has an extremely low total cost of ownership and is very simple to use.