According to Forrester Research, "...the number and diversity of BI [business intelligence] users will continue increasing steadily throughout the decade, achieving a penetration of between 25 percent and 40 percent of all enterprise users."1 We simply must get our BI environments beyond the "personal touch" that is possible with small user communities. Yet, there are challenges in achieving this vision. Not surprisingly, the biggest challenge is query performance. What is surprising is that many organizations manage their query performance issues by actually limiting user queries to a predefined list or predetermined times!
How is the vendor marketplace responding? It has been an acquisition race to extend functionality and provide end-to-end "one stop" shopping among the elite BI vendors. This includes players such as IBM, Oracle, Microsoft and SAP, among others. Fortunately, one need addressed in these "stacks" is data quality. Performance, however, is still relegated to minor, incremental improvements in the base technologies, such as database management systems. Smaller, less established vendors with important technology are not included as part of these functional stacks if they lack market share. It is hard for these smaller vendors to get noticed when the major vendors provide the one-stop shop. Tracking the array of offerings from just one prime vendor can be bewildering and time-consuming.
The quality of database administration and related disciplines to fully exploit database technologies has never been better. There is a finite set of nonhardware-based activities (physical modeling) that improves query performance, whether directly or indirectly.
- Indexes: Indexes are presorted, concise, alternate forms of the base data. The downsides are the additional physical storage required and the fact that the database management system (DBMS) optimizers need to appropriately use them.
- Parallelism: Assuming that you have the redundant hardware components and the required software to exploit them, many systems afford you the opportunity to partition your data yourself according to expected access.
- Summary Tables: Most access does not need detailed data. Summary tables, however, can take up enormous amounts of space. They frequently must be built in numerous ways from the detail data for numerous views, thus creating a high amount of redundant data in the environment. Summary tables quickly get out of sync with the underlying base data. The maintenance actually presents a bigger issue than the space.
- Multidimensional Structures: Multidimensional structures, or cubes, are similar to summary tables, but much more compact, with built-in implied access paths that can be easily used by data access tools. The maintenance challenges are even more pronounced with cubes.
- Data Marts: Data marts are usually subsets of data spun from the data warehouse itself. These marts tend to be targeted to very specific purposes. This creates additional extract, transform, load (ETL) logic in the architecture, as well as lags in data availability.
These methods are mostly properly used in BI programs, but because of query performance issues, many still do not deliver on the BI promise of "the right data to the right people at the right time." Hardware upgrades and additions often seem to be the only means available to improve performance without radically changing the underlying architecture. Many end up scaling hardware in excess proportion to its delivered value, which translates into sublinear performance gains.
Indexes are the most elegant manner of improving query performance. They are easiest to implement (with a simple CREATE INDEX statement). Traditional data warehouse optimizers are fairly sophisticated in the selection of indexes for query processing. Most importantly, indexes are automatically maintained and require no changes to existing tables, applications or interactive environments. They are seamless to the end user, and nearly seamless to the database administrator.
Indexes could be the primary way to improve the performance of BI queries if not for the drawbacks of storage and the challenge of multitable joins. Classic index disk storage is a factor of their size being something similar to: key size times number of records. BI time-to-value could be improved tremendously if the physical modeling were simplified to the point of collecting and cleaning detailed data, and the performance considerations were minimized. More importantly, the value of the existing investment in BI applications could actually be extended to the size of the user community that could benefit from it.
It cannot be emphasized enough that as query performance improves, so does system throughput.
RightOrder QueryEdge provides new technologies and software for indexing that accentuate the positive qualities of indexing, while minimizing the negative aspects. It does this by:
- Highly compressing the keys to a fixed size (7 bytes), regardless of the number of columns in the key. Thus, a page/block I/O will gather many more keys.
- Not limiting the columns you can put in an index. QueryEdge helps ensure that a join of arbitrary complexity can be indexed without impacting the index size.
- Making the indexes precomputed joins so that at execution time multitable queries are not required to use multiple indexes or tables, just the QueryEdge index. This drastically reduces the amount of required disk and memory I/O and lowers the CPU consumption by eliminating a join calculation. This reduces the overall system load and even the queries that are not directly using QueryEdge indexes will run faster. Entire system throughput is increased.
Given the alternatives available today to performance tune database management systems, including long cycle time and costly and risky hardware alternatives, an investment in an indexing approach would seem to yield the biggest return in query performance and, by virtue of a nonintrusive implementation, is an option worth adding.
- Russom, Phil and Keith Gile. Implementing Successful Enterprise Business Intelligence. Forrester Research, Inc. Jan. 2003.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access