The reality of data explosion in multidimensional databases is a surprising and widely misunderstood phenomenon. For those about to buy or use an OLAP product, it is critically important to understand what data explosion is, what causes it and how it can be avoided because the consequences of ignoring data explosion can be very costly and, in most cases, result in project failure.
There are very few OLAP vendors who can truly claim to have technically conquered the consequences of data explosion. The claims offered by many vendors about how they manage data explosion make it very difficult to understand what is actually important on this topic and what’s not.
For example, one of the problems of data explosion is that it results in a massive database. The size of the database in one product can literally be hundreds and even thousands of times bigger than the same database in another product.
Rather than admit to the problems of data explosion, the vendor with the massive database will argue that his database is handling large data sets, while he will imply that the vendor of the smaller database – a database without data explosion – cannot address large enterprise data sets.
The correct analysis should be to compare sizes with equal volumes of base data, but because the size of the databases are so profoundly different, prospective customers find it hard to believe that such dramatic differences are possible with similar data sets.
The end result is that organizations often commit to what they erroneously believe is the best so- called enterprise solution. This mistake comes at a huge price (see consequences of ignoring data explosion). Ironically, in this way, a vendor’s biggest weakness (data explosion) becomes their biggest selling point.
The Consequence of Ignoring Data Explosion
Some of the consequences of ignoring data explosion include:
- Massive databases that are literally hundreds and even thousands of times larger than is necessary.
- Expensive hardware is required to process and accommodate exploded data.
- Load and or calculation times that take hours rather than seconds or minutes.
- Large costs to build and maintain these monolithic models.
- The hidden cost of failing to provide timely and relevant enterprise business intelligence – there is a great cost associated with the inability to make fast business decisions and the negative culture that prevails because of poor underlying analytical systems.
- Real or de facto project failure.
Sparsity (Antonym – Density)
“Sparsity” and “sparsity handling” are important concepts worth understanding as a precursor to understanding data explosion.
Input data or base data (i.e., before calculated hierarchies or levels) in OLAP applications is typically sparse (not densely populated). Also, as the number of dimensions increase, data will typically become sparser (less dense).
For example, in a one-dimensional matrix, you can suppress all zero values and, therefore, have a 100 percnet dense matrix. In a two-dimensional matrix, you cannot suppress zeros if there is a non-zero value in any element in the two dimensions (see Figures 1 and 2).

Figure 1: 100 Percent Dense

Figure 2: 25 Percent Dense (4 Out of 16 Data Points Populated)
While it is not true in all cases, typically as the number of dimensions in a model increases, so does the data sparsity. For example, if you are storing sales data by product and by month, it is conceivable that you will sell each product each month (100 percent dense). However, if you were storing sales data, by product, by customer, by region, by month, clearly you would not sell each product to every customer in every region every month.
By adding the dimension “gender” to this model, you would double the possible size of the cube by storing the data by either of the two variables male or female, but the size of the stored data would remain the same. In this case, by introducing another simple dimension, the sparsity will have doubled!
To provide a practical baseline expectation for sparsity, we researched data sparsity on a variety of models with a sample of seven companies. Each company had a variety of models (e.gg, P&L, balance sheet, cash flow, sales analysis, HR/labor analysis, budgeting and forecasting, industry-specific models, etc.) with differing dimensions.
The industry-specific models included insurance claim analysis, telco call analysis and revenue per-user analysis and a medical device company’s sales analysis. A detailed summary of our research can be found in the sidebar.
Summary Findings
- Data density in all cases was significantly less than 1 percent – i.e., extremely sparse.
- As the number of dimensions increases, so did the sparsity of the data (models reviewed had between 5 and 16 dimensions).
- Extreme sparsity existed in all the industry-specific models (all models had density of less than 1 billionth of a percent).
Sparsity Handling
Superficially, any multidimensional model needs to provide space for every possible combination of data points. Since in sparse models most data points are zeros, the main issue is how to store all values other than zero values. For example, if the data density of a model is 1 percent and there is no sparsity handling, the resulting model will be 100 times larger than a model that has perfect sparsity handling. Sparsity handling therefore is the efficient storage of very sparse data.
Don’t Confuse Poor Sparsity Handling with Data Explosion
It is important not to confuse poor sparsity handling (the inefficient storage of zero values) with data explosion. Although sparsity handling is an issue for multidimensional databases, it usually only accounts for differences of less than ten times between products.








