MAY 1, 2003 1:00am ET

Related Links

Visiting Nurse Service Cares About Cloud Security
October 25, 2011
Light at the End of the Silo
October 28, 2010
Pitney Bowes Releases Enhancements to MapInfo Professional
September 13, 2010

Web Seminars

6 Key Things to Fast Track your Mobility Strategy
February 23, 2012
Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
Dashboards: How's Business? Ask your Data!
March 15, 2012

Tera-Scale Data Appliances for Business Intelligence

Print
Reprints
Email

In the current age of data warehousing, business intelligence (BI) can make or break a company. The timely processing and retrieval of vast amounts of data is vital to the decision-making process. However, just as important as timeliness is the depth of data analysis possible. With the growing size of the average data warehouse, achieving these goals has become increasingly difficult; already, terabyte-sized data warehouses are fairly common.

According to Greg’s Law, data is estimated to double on average every nine months. Vendors have thus far handled this rapid growth in database size with very expensive and consistent upgrading of hardware and software, but over the past few years it has become clear that existing infrastructures are unable to effectively handle the demands of in-depth analysis on large amounts of data. Furthermore, the Internet has brought a greater level of user access to databases. As demand for access and analysis continues to grow, users relying on general-purpose hardware and software will have to search for solutions specifically designed to address this problem.

The challenge is to provide a purpose-built solution for the problem that is both specific and flexible. That is, it must be suited to the task of handling vast amounts of data and yet be compatible with the customer’s existing BI applications and infrastructure. Furthermore, such a solution should be relatively simple to put into place, in comparison with the highly complex (from a database administration point of view) systems currently available. Such systems are purpose-built appliances – expandable, affordable and uniquely suited to the ever-growing needs of users in terms of speed and sophistication of data analysis.

The Current State of Business Intelligence

The current BI infrastructure is a patchwork of hardware, software and storage that is growing ever more complex. Consider a typical BI solution:

  • The database management system (DBMS) was initially architected for transaction processing, holding several hundred megabytes worth of records with a few internal users;
  • The DBMS has been improved incrementally over the years to support terabyte-sized databases, Internet-scale users and the evolving SQL definition;
  • The hardware/operating system is a clustered set of generic boxes that are optimized for everything from mathematical queries to genome investigation; and
  • The system is attached to generic file systems that manage and serve data for a variety of applications.

Some systems are optimized for performance, but these optimizations have been performed in stages over time and the underlying architecture has remained general in nature. Several database administration (DBA) and DBMS packages have been put in place; symmetric multiprocessing (SMP) servers and disk arrays from a variety of vendors serve the data; and an even larger selection of client applications are placed on top of this warehouse behemoth. For example, a company may use an Oracle DMBS, an HP server and a storage solution from EMC and, as their system grows, they may add Hitachi storage and a second server. With these types of systems, data and user applications have to be continuously tuned and optimized.

Tera- scale databases that continue to grow steadily put tremendous strain on these systems. In addition, the queries run against the database grow more complex. Sophisticated analytical methods require complex queries and models; for example, Web log and customer segmentation analyses are taxing current database systems. The problem here is twofold: first, the complex queries strain the system and slow the other queries being run. Second, if a business user is unable to get results in real time, he is unlikely to try another query of equal or greater complexity. Therefore, the process behind obtaining useful information quickly becomes impaired.

Even in cases where the user base and data set are relatively stable, current BI systems often fail to meet their basic goal of delivering vital business information so that timely decisions may be made. From an administration standpoint, this current patchwork of solutions is a nightmare. From the point of view of the business user, it is frustrating and does not provide the agility and performance the users are looking for. These strains occur because vendors have upgraded these systems incrementally over the years rather than change the underlying architecture to address the unique requirements of today’s tera-scale databases.

The issues with current BI architectures are evident across a broad range of companies and industries. While the system strain will become worse in the next few years, these problems exist today and are plaguing both business users and database administrators. Patchwork solutions can only hold together for so long; as database growth continues its exponential rise, the weak points in current systems are only going to become more aggravated. A new solution must be engineered now – that solution is a tera-scale data appliance that is purpose-built for BI.

The Case for a Tera-Scale Data Appliance for Business Intelligence

Applied to BI, a tera-scale data appliance is a purpose-built machine capable of retrieving valuable decision-aiding intelligence from terabytes of data on the order of seconds or minutes as opposed to hours or days. Appliances represent the difference between making a decision using stale data and making a decision with the freshest information possible. Tera-scale data appliances are engineered for the purpose of delivering results while the results are still relevant.

A tera-scale data appliance that is purpose-built for BI is:

  • Optimized for maximum performance
  • Scalable
  • Reliable
  • Easy to use

Optimization.Optimization affects both the storage and retrieval of data. A data appliance is engineered to deliver intelligence quickly and efficiently, no matter the database size. The appliance also allows for real-time updates to data, eliminating the delivery of stale data to the end user. The most important factors in BI are the timeliness and freshness of the results; they should be returned in a useful time frame, allowing a company to maximize their options. The appliance provides the real-time updates and retrievals critical to BI; such optimizations are done automatically by the appliance, without heavy DBA involvement.

Scalability. A tera-scale data appliance should be truly scalable. That is, the addition of extra storage to accommodate a larger data warehouse should not adversely affect performance. Specifically, the business users running queries against the data should not feel the effects of the growth. In order to accomplish this, the major bottleneck points must be distributed in the system rather than placed centrally. For large data transfers, bottlenecks are internal network speed and disk transfer speed; for complex queries, the bottleneck is often the CPU. An ideal data appliance should be able to scale to support a multi-terabyte-size database without major performance degradation.

Reliability. Reliability is critical. One level of reliability comes from the inherent abstraction of an appliance. By keeping the inner workings from being modified by the users or administrators, the potential for failure decreases. Another level of reliability is provided by the homogeneous nature of an appliance; all parts of the system come from one vendor. The customer does not have to integrate disk arrays, operating systems and database software, hoping that they will all work together flawlessly. Reliability increases as the number of vendors decreases, and multiple general-purpose offerings are replaced with a single solution.

Ease of Use. Obviously, we cannot do away with database administration entirely, as a certain level of management is necessary in order to maintain database integrity and performance. However, we can make the database system administrator’s job much easier, specifically in the area of end-user software compatibility. By making the appliance compatible with all common database standards (ODBC, etc.) and placing it through rigorous testing, the appliance manufacturer can ensure that applications can interoperate with the appliance. Thus, the ongoing support issues can be minimized.


Figure 1: The Tera-Scale Data Appliance for Business Intelligence

Why Now?

Given the long history of database development and the existence of previous attempts at database appliances/machines, why is now the time for a tera-scale data appliance in BI?

There are several reasons that the appliance is now possible, but the most important of these is the maturity of database technology. The database standards have been set, and this allows the system to be built completely around the desires and needs of the end user. Furthermore, the concept of a relational database is well defined and the users are experienced and eager to run increasingly complex queries. A wide variety of sophisticated applications and tools with standard interfaces allow widespread access to the database. And, as noted earlier, terabyte-sized databases, an influx of users and a demand for complex queries have placed unprecedented strain on the existing patchwork infrastructure.

Users of BI and data warehousing, therefore, need a system that yields high performance, both in speed and storage. High-powered specialized hardware drove the database machines of the past, but now there is a need for better performance at a lower cost. The power of current technology is great enough that commercial, off-the-shelf components, which are dropping in price, can be used to construct a tera-scale data appliance. This appliance can provide valuable BI at a fraction of the cost of current industry database systems.

What is Today’s Tera-Scale Data Appliance for BI?

People often associate appliances with simplicity, and databases by nature are not simple. The high-performance, tera-scale data appliance, however, is not a simple tool mechanically; rather, it makes BI more useful to the end user. The appliance starts over from the beginning, addressing the problems and concerns of the end user and the issues raised by the growing size of databases. The tera-scale data appliance is clean, efficient, expandable and powerful.

A tera-scale data appliance integrates the hardware, DBMS and storage into one opaque device. It combines the best elements of SMP and massively parallel processing (MPP) architectures into a new architecture to allow a query to be processed in the most optimized way possible. It is architected to remove all the bottlenecks to data flow so that the only remaining limit is the disk speed – a data flow architecture where data moves at “streaming” speeds. Through standard interfaces, it is fully compatible with existing BI applications, tools and data. And it is extremely simple to use.

How Businesses Benefit from a Tera-Scale Data Appliance for Business Intelligence

A tera-scale data appliance for BI provides speed for the business user. The time of waiting hours or days for queries to finish is past. Patience may be a virtue, but when it comes to BI, decision- makers need results now. The size of the average data warehouse is increasing and showing no signs of slowing down – with this increased store of knowledge comes an increased demand for BI. Businesses should not need to discard customer data from two months ago because their database slows to a crawl when the data is kept.

A tera-scale data appliance for BI provides freedom to the business user. Right now, users are limited in the queries they can run because of the time required to run them. Thus, users end up running the same set of queries against the database. With the time required to run a complex query reduced to seconds, users can not only run their old queries more often, but they have the time to devise and run whole new sets of queries.

A tera-scale data appliance provides simplicity for the administrator. The integrated nature of an appliance means that the time typically spent troubleshooting a complex database system can be spent in more productive endeavors. The effort is not to simplify a complex system, but rather to remove the appearance of being complex, by abstracting away the mechanical details. The end result is the removal of legacy systems and piecemeal components.

A tera-scale data appliance provides ease of database growth. The inherent scalability in a modified-MPP architecture stems from the modularity of the nodes. Ideally, we want a database with linear scaleup;1 that is, with n times the hardware, we should be able to handle a task n times as large in the same amount of time. The tera-scale data appliance provides us just that flexibility.

A tera-scale data appliance provides the lowest total cost of ownership. Being purpose-built means that it is constructed from commodity hardware, eliminating the overhead of special purpose hardware. The appliance has one source, one vendor and, therefore, the costs associated with support are reduced. With existing technologies, the process of data growth typically incurs costs; hardware must be added and ongoing maintenance must be performed. The tera-scale data appliance reduces these costs with inexpensive yet powerful hardware from one source.

With the simple, efficient solution provided by a tera-scale data appliance for BI, businesses will run more efficiently. Results will be returned within seconds or minutes – orders of magnitude faster than with current architectures. Businesses today demand rapid response times to generate rapid results.

Conclusion

The success of decision making in a company relies on business intelligence. BI, in turn, relies on the underlying database architecture. Current database architectures are patchwork systems, built in pieces and not optimized for delivering timely results. The maturity and stability of the relational database, paired with the power of consumer computer components, allows for a breaking down of the database system. Starting with a clean slate, the next generation database system should be engineered with the end user in mind. The system should be clean, scalable and enable optimized BI. A new generation of tera-scale data appliances holds promise for companies that depend on business intelligence.

References:

1. DeWitt, D. J. and J. Gray. Parallel Database Systems: The Future of High Performance Database Processing. ACM Communications. Vol. 35(6). June 1992. Pps. 85- 98.

Foster Hinshaw, often referred to as the Father of Data Warehouse Appliances, brings a wealth of creativity, technical and operational expertise in both hardware and software to Dataupia. He is accomplished at designing and developing large complex systems for business-critical enterprise and departmental applications, as well as Web-based e-commerce systems. Prior to Dataupia, Hinshaw founded Netezza, the provider of enterprise-class business intelligence appliances.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.