Problems Areas Analyzed
This year, 1,679 users participated, some of whom are DM Review readers. The large sample sizes support not only numerous overall analyses, but also many comparisons between subsamples, such as products and application size bands, and some unexpected and intriguing results.
The OLAP Survey asks a number of questions about problems and support. For example, respondents are asked about the worst problems they have encountered and can nominate up to three. We subsequently classify these into three categories: data (poor quality, inability to get data from some systems), people (administrative problems, company politics, could not agree on requirements, lack of interest from business users, requirements changed before project completed) or technical/product related (missing key features, product could not handle large numbers of users, product could not handle the data volumes, query performance too slow, security limitations in the product).
Looking at the collective totals, the picture has been quite stable for the last three years (see Figure 1).
Figure 1: Problem Trends Since 2002
There have consistently been more people-related problems reported overall. Perhaps surprisingly, data problems are the least common.
Let's now concentrate on the top three problems and see how they've changed over the years. In a nutshell, company politics and poor quality data - by far the worst problems in 2001 - are now only about half as prevalent. They dropped sharply by 2003 and 2004, and have since been relatively stable. At the same time, more and more people are reporting query performance as a serious problem. In fact, it is the only problem that has gotten worse every year.
Figure 2: Most Serious Deployment Problems
This is a curious result. We all know that hardware gets ever faster and cheaper, and most software vendors boast that each new release has been tuned to perform even better than before. So how can performance have gotten steadily worse, while the hardware and software are getting faster?
Figure 3: Most Serious Problem Trends in The OLAP Surveys
The most obvious theory is that rocketing data volumes are the cause. After all, if data volumes are increasing even faster than hardware performance improves, it might explain the rising complaints about performance. Looking at the actual median query times and comparing them with the median input data volumes over the last five years, we get some more surprises (see Figure 4).
Figure 4: Query Times versus Volumes Trend
Amazingly, there is no evidence of exploding data volumes. In fact, input data volumes have varied little over the last five years, remaining close to the surprisingly modest figure of approximately five gigabytes. There are a growing number of very large applications, but they probably remain very much in the minority. At the same time, there may be many more new small applications, so the typical or median application size has hardly changed at all.
Just as remarkable is the correlation between typical input data volumes and typical reported query times. They rise and fall in almost perfect lock step. In fact, the ratio of query time in seconds to input data volumes in gigabytes has stayed constant at 1.7 sec/GB for the last three years, and it is higher than the 1.5 sec/GB in 2003. If hardware and software performance were really improving, this ratio should be falling steadily, not staying flat.
This suggests that the undoubted improvements in hardware performance have been absorbed by less efficient software. For example, the move to thin-client and multilayered architectures may have many advantages, but better performance is not necessarily one of them. Good old-fashioned client/server architectures, if implemented well (as they are in the best OLAP tools), are very hard to outperform. So, whenever you hear of a new architectural acronym, ask first what it will do to performance!
Another reason for the surprisingly low data volumes is the preponderance of enterprise performance management (EPM) applications such as planning, forecasting, budgeting, dashboards and financial consolidation. These applications typically store much smaller data volumes than sales and marketing analysis applications, for example. Not only are EPM applications much more common than others, but they are also the only ones on a (modest) growth trend.
For example, the top EPM application (planning and forecasting) has grown from being in use in 31 percent of sites in 2002 to 46 percent in 2006, whereas the top reporting application (general data warehouse reporting) has remained flat at 45 percent, and the top analysis application (sales and marketing analysis) has barely grown from 39 percent in 2002 to 42 percent in 2006. Customer relationship management has fallen to 11.5 percent from 18 percent in 2002.
Figure 5: Query Times versus Complaints
If we plot the query time and query complaints trends on a single chart, we can see that the fluctuations in query response, which so closely mirror input data volumes, are not correlated in any way with query performance perceptions. Median query times, which have fluctuated in a relatively narrow band from 6.6 to 8.8 seconds over the last five years, have become steadily less acceptable to users. But why? Perhaps a clue can be found in our daily lives, far away from BI queries. Just try a Google search of arbitrary complexity. It doesn't seem to matter whether your query finds one page or a billion; the reported query time is likely to be a small fraction of a second.