I get many inquiries on the differences and pros and cons of MOLAP vs. ROLAP architectures for analytics and BI.
In the old days the differences between MOLAP, DOLAP, HOLAP and ROLAP were pretty clear. Today, given the modern scalability requirements, DOLAP has all but disappeared, and the lines between MOLAP, ROLAP and HOLAP are getting murkier and murkier. Here are some of the reasons:
- Some RDBMS (Oracle, DB2, Microsoft) offer built in OLAP engines, often eliminating a need to have a separate OLAP engine in BI tools
- Some of the DW optimized DBMS like Teradata, SybaseIQ and Netezza partially eliminate the need for an OLAP engine with aggregate indexes, columnar architecture or brute force table scans
- MOLAP engines like Microsoft SSAS and Oracle Essbase can do drill throughs to detailed transactions
- Semantic layers like SAP BusinessObjects Universe have some OLAP-like functionality
For multiple reasons, outlined in OLAP and BI Self Service documents, including the main one - drill anywhere - I feel that ROLAP is a superior approach to MOLAP. Many vendors support that notion and are rushing to invent ROLAP architecture, or, alas, in many cases “markitechture”. So, here’s my first attempt to identify a few ROLAP features that can clearly differentiate a true ROLAP engine from a fake. True ROLAP has to:
- Support multi pass SQL. This is required to answer analytical questions that cannot be answered in a single pass of SQL
- Minimize amount of data movement between RDBMS and client app by:
- Storing (temp tables, derived tables) intermediate query results from multi pass SQL and joining them in RDBMS
- Pushing all calculations to RDBMS
- Pushing all filtering and grouping to RDBMS
- Pushing all filtering and grouping on calculations to RDBMS
- Be aggregate aware. That means that adding or dropping aggregate tables should have no effect on reports and SQL
- Optimize SQL based on RDBMS type
- Support heterogeneous joins to join data from multiple RDBMS in a single report. Again, to minimize data movement this should be done in RDBMS. Putting in a separate EII/data federation layer/tool does not count
- Handle normalized and denormalized data models equally efficiently
- Automatically resolve typical start/snowflake query errors and conflicts, such as trying to join multiple fact tables
Before I publish a document on this subject I’d like to collect everyone’s feedback on:
- Did I get it right?
- Am I missing any other key ROLAP differentiators?
- What current OLAP engines qualify as ROLAP given these criteria? MicroStrategy and OBIEE immediately come to mind. What about Mondrian? Others?