Continue in 2 seconds

The Problems with OLAP

  • March 01 2007, 1:00am EST
More in

The OLAP Survey is an independent business intelligence (BI) survey that has been conducted annually since 2000. It is a deep analysis of why organizations select BI products, how they deploy them and how successful they are.

Problems Areas Analyzed

This year, 1,679 users participated, some of whom are DM Review readers. The large sample sizes support not only numerous overall analyses, but also many comparisons between subsamples, such as products and application size bands, and some unexpected and intriguing results.

The OLAP Survey asks a number of questions about problems and support. For example, respondents are asked about the worst problems they have encountered and can nominate up to three. We subsequently classify these into three categories: data (poor quality, inability to get data from some systems), people (administrative problems, company politics, could not agree on requirements, lack of interest from business users, requirements changed before project completed) or technical/product related (missing key features, product could not handle large numbers of users, product could not handle the data volumes, query performance too slow, security limitations in the product).

Looking at the collective totals, the picture has been quite stable for the last three years (see Figure 1).

Figure 1: Problem Trends Since 2002

There have consistently been more people-related problems reported overall. Perhaps surprisingly, data problems are the least common.

Top Problems

Let's now concentrate on the top three problems and see how they've changed over the years. In a nutshell, company politics and poor quality data - by far the worst problems in 2001 - are now only about half as prevalent. They dropped sharply by 2003 and 2004, and have since been relatively stable. At the same time, more and more people are reporting query performance as a serious problem. In fact, it is the only problem that has gotten worse every year.

Figure 2: Most Serious Deployment Problems

This is a curious result. We all know that hardware gets ever faster and cheaper, and most software vendors boast that each new release has been tuned to perform even better than before. So how can performance have gotten steadily worse, while the hardware and software are getting faster?

Figure 3: Most Serious Problem Trends in The OLAP Surveys

The most obvious theory is that rocketing data volumes are the cause. After all, if data volumes are increasing even faster than hardware performance improves, it might explain the rising complaints about performance. Looking at the actual median query times and comparing them with the median input data volumes over the last five years, we get some more surprises (see Figure 4).

Figure 4: Query Times versus Volumes Trend

Amazingly, there is no evidence of exploding data volumes. In fact, input data volumes have varied little over the last five years, remaining close to the surprisingly modest figure of approximately five gigabytes. There are a growing number of very large applications, but they probably remain very much in the minority. At the same time, there may be many more new small applications, so the typical or median application size has hardly changed at all.

Just as remarkable is the correlation between typical input data volumes and typical reported query times. They rise and fall in almost perfect lock step. In fact, the ratio of query time in seconds to input data volumes in gigabytes has stayed constant at 1.7 sec/GB for the last three years, and it is higher than the 1.5 sec/GB in 2003. If hardware and software performance were really improving, this ratio should be falling steadily, not staying flat.

This suggests that the undoubted improvements in hardware performance have been absorbed by less efficient software. For example, the move to thin-client and multilayered architectures may have many advantages, but better performance is not necessarily one of them. Good old-fashioned client/server architectures, if implemented well (as they are in the best OLAP tools), are very hard to outperform. So, whenever you hear of a new architectural acronym, ask first what it will do to performance!

Application Trends

Another reason for the surprisingly low data volumes is the preponderance of enterprise performance management (EPM) applications such as planning, forecasting, budgeting, dashboards and financial consolidation. These applications typically store much smaller data volumes than sales and marketing analysis applications, for example. Not only are EPM applications much more common than others, but they are also the only ones on a (modest) growth trend.

For example, the top EPM application (planning and forecasting) has grown from being in use in 31 percent of sites in 2002 to 46 percent in 2006, whereas the top reporting application (general data warehouse reporting) has remained flat at 45 percent, and the top analysis application (sales and marketing analysis) has barely grown from 39 percent in 2002 to 42 percent in 2006. Customer relationship management has fallen to 11.5 percent from 18 percent in 2002.

Figure 5: Query Times versus Complaints

If we plot the query time and query complaints trends on a single chart, we can see that the fluctuations in query response, which so closely mirror input data volumes, are not correlated in any way with query performance perceptions. Median query times, which have fluctuated in a relatively narrow band from 6.6 to 8.8 seconds over the last five years, have become steadily less acceptable to users. But why? Perhaps a clue can be found in our daily lives, far away from BI queries. Just try a Google search of arbitrary complexity. It doesn't seem to matter whether your query finds one page or a billion; the reported query time is likely to be a small fraction of a second.

Over the last few years, we've all become used to free, near-instant searches of unimaginably large indexes of tens of billions of Web pages, images, videos, news channels, Usenet postings and books. If a PC bought today is around ten times as fast as a more expensive model from as little as five years ago, why haven't our BI applications speeded up by a similar factor?

Not only is performance the most-complained-about problem, but the projects that used it as a buying criterion were more likely to derive business value than those that did not. In fact, no other buying criterion had a higher correlation with project success. It was also the most common technical deterrent to wider deployments (behind license costs). For more information, see

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access