One of the advantages of my position as publisher of DM Review is that I have many opportunities to visit seminars, conferences and vendor sites. Consequently, I am keenly aware of enterprise requirements and the solutions vendors provide to meet those requirements. Occasionally, I encounter solutions that I feel are very unique and valuable. My Publisher's Insight was developed to introduce DM Review readers to these products.
Performance has become one of the major issues for most enterprises that have to process vast, ever-increasing data volumes. One of the new companies that has tackled this issue head on is HyperRoll. HyperRoll Data Warehouse Suite creates and manages aggregate information across the entire corporate information factory. It does this by optimizing the access and analysis of summary information. Bill Inmon has provided us with an in-depth look at how HyperRoll has enhanced the performance of the corporate information factory.
Performance has always been an issue in the world of information technology in operational systems and in analytical systems. Performance is a multifaceted issue that can be addressed by adding hardware resources, designing the system properly, tuning the system once in operation and so forth. In short, depending on where the system is in its development life cycle, there are different approaches to achieving performance.
One of the most effective ways of achieving performance is through aggregating and summarizing data. To illustrate the effectiveness of aggregation and summarization, suppose an analyst wishes to find out the total purchase amount made by a customer over the course of a year. One way to calculate the total purchases is to individually find them and add them up. There is nothing particularly troublesome about making the calculation. However, finding the relevant data is another matter. Much nonqualifying data must be sorted through to find the appropriate data to be used in the calculation. With indexing, such a search is tedious; without indexing, the search is onerous. However performed, the search for specific records can consume significant resources. Clearly, the search is possible. However, significant computer resources primarily I/Os are utilized in so doing. A more efficient approach is to aggregate and summarize the data once, and then store the calculation for future access. Once such an aggregation/summarization is complete, it is very easy for the analyst to search the data.
The real benefit comes when the summarization is used repeatedly. It becomes disastrous when multiple analysts need the same summarized information and the same tedious calculation is performed repeatedly by each analyst.
How much is the payoff in terms of performance for properly summarizing data that will be used repeatedly? The payoff is significant probably from one order of magnitude to several orders of magnitude depending on the particulars of the environment.
With this level of performance advantage, HyperRoll plays an important role. HyperRoll is a new company whose product creates and manages aggregate information across the enterprise. With HyperRoll, there is a minimal requirement for JOIN or GROUP BY processing. When JOIN and GROUP BY processing are minimized, good performance follows. HyperRoll's Data Warehouse Suite is an add-on product that allows existing systems and technology to remain unaffected while improving performance of existing processing. Because the suite is an add-on product, it does not require the rewrite of applications or reconfiguring of existing technology. Instead, the suite operates in an existing environment on an as-is basis. HyperRoll's suite is an important tool for protecting your existing investment in the systems and technology already running your environment. There is tremendous performance potential by adding HyperRoll's Data Warehouse Suite onto existing systems.
Figure 1: Relational Aggregation
HyperRoll works on the basis of optimizing the access and analysis of summary information. Consider the simple database design depicted in Figure 1. This figure shows a fact table, dimensions and summarizations. These are the basic components of a multidimensional database design. HyperRoll replaces the summarization tables that are built as an adjunct to the fact table as shown in Figure 2. By replacing classical summary tables, HyperRoll eliminates significant amounts of processing.
Figure 2: HyperRoll Replaces Summary Tables
Can't an organization simply design a system where the needed summarizations and aggregations can be anticipated? Designers can guess what the future will hold; however, designers cannot know what future requirements will be. This means systems are built with the understanding that they will need further refinement. One of the strengths of HyperRoll is its ability to operate on existing systems so that new requirements can be incorporated after the system has been built.
To illustrate where HyperRoll fits, consider the Corporate Information Factory as seen in Figure 3.
Figure 3: The Corporate Information Factory
Corporate Information Factory
The Corporate Information Factory is a framework that has evolved into a series of architectural components. Some of the more notable components of the Corporate Information Factory are the data warehouse, operational applications, data marts, the operational data store (ODS), decision support system (DSS) applications, exploration/data mining warehouse, and near-line or alternative storage.
Each of the components of the Corporate Infor-mation Factory has its own structure, attributes and advantages. Throughout the Corporate Infor-mation Factory, performance is important. Furthermore, performance is measured differently in different places. Performance is measured in seconds in applications, in minutes in data marts and in hours or even days in the exploration and data mining environment. However, wherever there is processing, performance matters. Figure 3 shows where the technology offered by HyperRoll makes the biggest difference.
There is another very important and strategic role played by HyperRoll, and that is an agent of change. In order to understand this role, consider exactly how the Corporate Information Factory is built. Nobody sets out to build an immutable Corporate Information Factory. Instead, the Corporate Infor-mation Factory evolves over time as the result of many people and many forces.
There are a number of reasons that the single data warehouse evolves into a larger Corporate Information Factory. As the evolution occurs, there are points of pain. The pain is manifested in poor performance, inability to handle large volumes of data, the need to restructure data by different entities, and so forth. The pain becomes acute immediately prior to an architectural entity physically splitting from the data warehouse.
It is at this point that the real value of HyperRoll becomes apparent. HyperRoll can be placed in the data warehouse environment so the level of pain is mitigated while the transition to a new environment occurs.
HyperRoll plays an important role in achieving performance in the Corporate Information Factory. HyperRoll enhances performance in the operations environment, the data warehouse environment, the data mart environment and in DSS applications. In addition, HyperRoll allows an organization to evolve to different architectural entities by extending performance possibilities prior to the separation of the data warehouse into an ODS, data mart or DSS application. HyperRoll alleviates some of the pain preceding the separation, thereby allowing the organization to achieve a graceful evolution.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access