MAR 1, 2007 1:00am ET

Related Links

When Fast is Not Enough
July 18, 2008
TopQuadrant Software Imports Email MetaData into Semantic Applications
March 26, 2008
An Open Challenge to the Open Source Community
November 30, 2007

Web Seminars

AARP: Embracing Dynamic, Agile Analytics Platforms for Big Data
June 5, 2013
IBM & Teradata Compared: A Total Cost of Ownership Study
May 22, 2013
Gaining a Competitive Advantage with Analytics in Banking
Available On Demand

The Problems with OLAP

Print
Reprints
Email

The OLAP Survey is an independent business intelligence (BI) survey that has been conducted annually since 2000. It is a deep analysis of why organizations select BI products, how they deploy them and how successful they are.

Problems Areas Analyzed

This year, 1,679 users participated, some of whom are DM Review readers. The large sample sizes support not only numerous overall analyses, but also many comparisons between subsamples, such as products and application size bands, and some unexpected and intriguing results.

The OLAP Survey asks a number of questions about problems and support. For example, respondents are asked about the worst problems they have encountered and can nominate up to three. We subsequently classify these into three categories: data (poor quality, inability to get data from some systems), people (administrative problems, company politics, could not agree on requirements, lack of interest from business users, requirements changed before project completed) or technical/product related (missing key features, product could not handle large numbers of users, product could not handle the data volumes, query performance too slow, security limitations in the product).

Looking at the collective totals, the picture has been quite stable for the last three years (see Figure 1).


Figure 1: Problem Trends Since 2002

There have consistently been more people-related problems reported overall. Perhaps surprisingly, data problems are the least common.

Top Problems

Let's now concentrate on the top three problems and see how they've changed over the years. In a nutshell, company politics and poor quality data - by far the worst problems in 2001 - are now only about half as prevalent. They dropped sharply by 2003 and 2004, and have since been relatively stable. At the same time, more and more people are reporting query performance as a serious problem. In fact, it is the only problem that has gotten worse every year.


Figure 2: Most Serious Deployment Problems

This is a curious result. We all know that hardware gets ever faster and cheaper, and most software vendors boast that each new release has been tuned to perform even better than before. So how can performance have gotten steadily worse, while the hardware and software are getting faster?


Figure 3: Most Serious Problem Trends in The OLAP Surveys

The most obvious theory is that rocketing data volumes are the cause. After all, if data volumes are increasing even faster than hardware performance improves, it might explain the rising complaints about performance. Looking at the actual median query times and comparing them with the median input data volumes over the last five years, we get some more surprises (see Figure 4).


Figure 4: Query Times versus Volumes Trend

Amazingly, there is no evidence of exploding data volumes. In fact, input data volumes have varied little over the last five years, remaining close to the surprisingly modest figure of approximately five gigabytes. There are a growing number of very large applications, but they probably remain very much in the minority. At the same time, there may be many more new small applications, so the typical or median application size has hardly changed at all.

Just as remarkable is the correlation between typical input data volumes and typical reported query times. They rise and fall in almost perfect lock step. In fact, the ratio of query time in seconds to input data volumes in gigabytes has stayed constant at 1.7 sec/GB for the last three years, and it is higher than the 1.5 sec/GB in 2003. If hardware and software performance were really improving, this ratio should be falling steadily, not staying flat.

This suggests that the undoubted improvements in hardware performance have been absorbed by less efficient software. For example, the move to thin-client and multilayered architectures may have many advantages, but better performance is not necessarily one of them. Good old-fashioned client/server architectures, if implemented well (as they are in the best OLAP tools), are very hard to outperform. So, whenever you hear of a new architectural acronym, ask first what it will do to performance!

Application Trends

Another reason for the surprisingly low data volumes is the preponderance of enterprise performance management (EPM) applications such as planning, forecasting, budgeting, dashboards and financial consolidation. These applications typically store much smaller data volumes than sales and marketing analysis applications, for example. Not only are EPM applications much more common than others, but they are also the only ones on a (modest) growth trend.

For example, the top EPM application (planning and forecasting) has grown from being in use in 31 percent of sites in 2002 to 46 percent in 2006, whereas the top reporting application (general data warehouse reporting) has remained flat at 45 percent, and the top analysis application (sales and marketing analysis) has barely grown from 39 percent in 2002 to 42 percent in 2006. Customer relationship management has fallen to 11.5 percent from 18 percent in 2002.


Figure 5: Query Times versus Complaints

If we plot the query time and query complaints trends on a single chart, we can see that the fluctuations in query response, which so closely mirror input data volumes, are not correlated in any way with query performance perceptions. Median query times, which have fluctuated in a relatively narrow band from 6.6 to 8.8 seconds over the last five years, have become steadily less acceptable to users. But why? Perhaps a clue can be found in our daily lives, far away from BI queries. Just try a Google search of arbitrary complexity. It doesn't seem to matter whether your query finds one page or a billion; the reported query time is likely to be a small fraction of a second.

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.