Editor's Note: This article provides an overview regarding the use of an appliance approach for data warehousing infrastructure. Although it does not compare and contrast the appliance approach with the more traditional approaches, it does provide an explanation of this emerging technology.
Fully leveraging business intelligence (BI) can make or break a company. The timely processing and retrieval of vast amounts of data is vital to the decision-making process. Equally as important as timeliness is the depth and breadth of data analysis. With the growing size of the average data warehouse, achieving these goals has become increasingly difficult; terabyte-sized data warehouses are already fairly common.
With the advent of the Internet and data tracking systems, data volumes continue to double on average every nine months. This data growth rate is twice as fast as Moore's Law. The widening gap between compute power and exponentially growing data is made worse by the increasing need for in-depth analysis as companies move from standard reporting toward interactive ad hoc BI and discovery analysis. This gap is even further exacerbated by the escalating need for near real-time results.
Vendors have thus far handled this rapid growth in database size with very expensive and consistent upgrading of hardware and software; however, over the past few years, it has become clear that existing infrastructures are unable to effectively handle the demands of in-depth analysis on large amounts of data. Plus, the economic impact of this approach has taken its toll on end users and left some organizations out in the cold with their inability to purchase, deploy and maintain a data warehouse that fits their business needs.
The goal is to find a new approach to data warehousing infrastructure that is both specific and flexible. That is, it must be suited to the task of handling vast amounts of data, yet be compatible with the customer's existing BI applications and infrastructure. Such a solution needs to be straightforward to deploy, in contrast to the highly complex systems currently available. Data warehouse appliances are expandable, affordable and uniquely suited to the ever-growing needs of users, providing orders of magnitude performance improvements in terms of speed and sophistication of data analysis.
The Current State of BI Infrastructure
The current BI infrastructure is a patchwork of hardware, software and storage that is growing ever more complex. Consider a typical patchwork BI solution:
- The database management system (DBMS) was initially architected for transaction processing, holding several hundred megabytes worth of records with a few internal users.
- The DBMS has been improved in increasingly complex layers over the years to support terabyte-sized databases, Internet-scale users and the evolving SQL definition.
- The hardware/operating system is a clustered set of generic boxes optimized for everything from mathematical queries to genome investigation.
- The system is attached to generic file systems that manage and serve data for a variety of applications.
This basic patchwork paradigm has not changed during the past 10 years. Within each of the aforementioned "silos," Moore's Law improvements have not kept pace with database volumes, complexity or performance demands. On the DBMS front, we have seen attempts at grid management placed on top of an already complex DBMS. For servers and storage, we have seen increasingly complex SMP boxes and storage area network (SAN) and network attached storage (NAS) which have been targeted for improving transactional workloads but have demonstrated only incremental benefits in the BI space. Even the advent of grid and blade architectural improvements has not addressed the demands of today's BI because they have generally been developed for transactional systems. They continue to enforce the patchwork approach with each silo being optimized for general-purpose operations, not for BI requirements.
In the absence of significant architectural improvements, the traditional answer to the growing BI problem is to continue to add more hardware. For example, a company may use an Oracle DBMS, an HP server and a storage solution from EMC. As their system grows, they may add Hitachi storage and a second server. These types of systems require data and user applications to be continuously tuned and optimized.
Terabyte-scale databases that continue to grow steadily put tremendous strain on these systems. Even in cases where the user base and data set are relatively stable, current BI systems often fail to meet their basic goal of delivering vital business information in order to make timely decisions. From an administration standpoint, this current patchwork of solutions is extremely difficult and time-consuming to manage and maintain. From the business-user point of view, it is frustrating and does not provide the agility and performance the users are looking for. These strains occur because vendors have upgraded these systems incrementally over the years rather than change the underlying architecture to address the unique requirements of today's terabyte-scale databases.
The issues with current BI architectures are evident across a broad range of companies and industries. While the system strain will become worse in the next few years, these problems exist today and are plaguing both business users and database administrators. Applied to BI, a data warehouse appliance is a machine capable of retrieving valuable decision-aiding intelligence from terabytes of data in seconds or minutes versus hours or days. Appliances represent the difference between making a decision using stale data and making one using the freshest information possible.
The Maturation of the Data Warehouse Appliance
The data warehouse appliance is designed specifically for the streaming workload of BI and is built using commodity components. It architecturally integrates hardware, DBMS and storage into one opaque device and combines the best elements of SMP and massively parallel processing (MPP) approaches into one that allows a query to be processed in the most optimized way possible. A data warehouse appliance is architected to remove all the bottlenecks to data flow so that the only remaining limit is the disk speed - a data-flow architecture where data moves at streaming speeds. Through standard interfaces, a data warehouse appliance is fully compatible with existing BI applications, tools and data. It has an extremely low total cost of ownership and is very simple to use.
One of the most important trends in BI is the development of standardized interfaces, protocols and functionality. Database standards have been set, allowing the system to be built completely around the desires and needs of the end user. Today, unlike 10 years ago, we have a wealth of tools and applications using these standardized interfaces such as MicroStrategy, Business Objects, Cognos, SAS and SPSS. These are coupled with ETL tools having standardized interfaces such as Ab Initio, Ascential and Informatica. The appliance works seamlessly with these tools and applications as well as other in-house applications.
Perhaps the most identifiable benefit is the tremendous price/performance improvement that an appliance yields, both in speed and storage (see Figure 1). High-powered specialized hardware drove the database machines of the past, but now there is a need for better performance at a lower cost. The power of current technology is great enough that commercial, off-the-shelf components, which continue to drop in price, can be used to construct a data warehouse appliance that can provide valuable BI at a fraction of the cost of current industry database systems.
Figure 1: Telecom Customer Behavior Analysis
A data warehouse appliance is truly scalable. In transactional workloads, scalability is limited primarily by CPUs; however, in BI, the bottlenecks are the speeds of the internal busses, internal networks and disk transfer. Because the appliance is able to scale these elements without appreciable impact on system performance, effective multiterabyte-sized databases are a reality.
Reliability is also critical and is provided by the homogeneous nature of an appliance; all parts of the system come from one vendor. The customer does not need to integrate disk arrays, operating systems and database software with the hope that these pieces will all work together flawlessly. This is replaced with a single, architecturally integrated product.
A data warehouse appliance also provides simplicity for the administrator. The integrated nature of an appliance means that the time an administrator typically spent troubleshooting a complex database system can be spent in more productive endeavors. DBAs may now be deployed to assist end users doing real-time BI.
A data warehouse appliance offers the lowest total cost of ownership because it has one source/ one vendor, thus reducing costs associated with support. With existing technologies, the process of data growth typically incurs costs; hardware must be added and ongoing maintenance must be performed. The data warehouse appliance reduces these costs with inexpensive yet powerful hardware from one source. With the simple, efficient solution provided by a data warehouse appliance, businesses will run more efficiently. Results will be returned within seconds or minutes - orders of magnitude faster than with current architectures.
Benefits of a Data Warehouse Appliance
The increased performance manifests itself in many ways to the business user. Reports that took days can now take just minutes. Subtransactional data such as call detail records, individual Web clicks, itemized POS transactions and hyper-detailed customer activity can all now be readily analyzed in near-real time. The appliance obsoletes the notion of discarding customer data that is only two months old because the database slows to a crawl when the data is kept.
Further, the appliance provides freedom to the business user. With patchwork systems, users are limited in the queries they can run because of the time required to run them. With the time required to run a complex query reduced to seconds, users can not only run their old analyses with more iterations, but they have the time to devise and run entirely new sets of analyses on very granular data.
Data warehouse appliances are already helping support impressive BI deployments. For example, in the telecommunications industry, the rapid growth of call detail records (CDRs) creates an imposing amount of data, making it difficult for companies to quickly and efficiently analyze customer and call plan information. In fact, traditional approaches have been inefficient in processing queries on even one month's data, seriously hampering an organization's ability to perform trend analysis to reduce customer churn, better tackle revenue assurance issues and generate timely reports. With an appliance, the telecom user can analyze customer activity down to the CDR level or network-event level over a full year's worth of detailed data. Aggregations, with their inherently lossy nature, are no longer necessary; this approach yields highly enriched information for network management and customer satisfaction.
Retail is another industry where data warehouse appliances have already begun to prove their worth and are poised to play a bigger role in the future. Brick-and-mortar and online retailers are capturing enormous amounts of customer transaction, clickstream, operational and supply chain information, creating a data explosion that threatens to overwhelm an average retail organization and its current IT infrastructure. Data warehouse appliances enable retailers to manage and analyze these terabytes of information in near-real time and be able to use the information to effectively forecast buying patterns, quickly generate targeted promotions and optimize their inventory and supply chain.
The success of decision making in a company relies on business intelligence. BI, in turn, relies on the underlying database architecture. Current database architectures are patchwork systems, built in pieces and not optimized for delivering timely results. The maturity and stability of the relational database, paired with the power of commodity components, allows for a breaking down of the database system. A new generation of data warehouse appliances holds promise for companies that depend on business intelligence.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access