Editor's Note: This article is the last in a three-part series focusing on the data warehouse evolution.
DSS Part 2 - Data Marts (1994)
The first evidence of a split in the data warehousing ranks came with the introduction of the concept of data marts. The difficulty of selling, designing and implementing an enterprise solution caused a high casualty rate among the early adopters. Many abandoned any attempt to plan top down Those that succeeded concentrated on delivering business value and started small and grew incrementally. The best of the breed operated with a plan that would grow to support cross-functional needs. In effect, they built to enterprise scope one piece at a time.
Unfortunately, early failures caused many enterprises to retreat from the concept of data warehousing. Vendors and analysts, concerned about losing their meal ticket, took an architectural concept out of context and sold it as a justifiable stand alone solution. The term "data mart" was introduced into architectural frameworks as a spin-off of the data warehouse optimized for a particular department or function. Its very name implied a retail location that required a wholesale supplier of data behind it for it to exist. The concept of a data mart without a data warehouse (i.e., without coordinated data acquisition and preparation) is an oxymoron.
Convincing clients to build smaller, more contained departmental solutions saved the industry in the short run. There is little doubt this was a necessary evil. However, this violated the most fundamental principle of data warehousing: creating a single point of distribution. Integration and consistency goals cannot be achieved when each department sources, cleans, transforms and loads their own data. Stand alone data marts threatened to increase the very chaos that data warehousing was conceived to eliminate.
Even with this narrowed scope, many data mart projects also began to fail. Some reasons are classic IT failure modes: lack of a clear business driver and poor execution. However, a shocking number were based on bad design. Many used the Inmon model to justify their use of traditional OLTP design methods. The fact that this was a misreading of Inmon's intent did not deter anyone dead set on using what they already knew to build these new fangled databases. OLTP influenced third normal form designs failed utterly to support the DSS needs for accessibility and performance.
Luckily, Ralph Kimball's first book, The Data Warehouse Toolkit, 1 hit the market just as the data mart craze really took off. This best-selling book provided detailed design guidance on how to optimize data for analysis. Dimensional modeling spanned the gap from traditional relational design to the multidimensional databases that gave rise to the OLAP moniker. It provided a bridge between business requirement analysis and optimal physical data design that did not exist for decision support.
The year 1994 featured the launch of the OLAP offensive and followed by the ROLAP counter-offensive. The debate was widened from relational vs. dimensional to which form of dimensional do you want: relational OLAP-based on normalized tables, a star-schema form of denormalized design or a true multidimensional database.
Data Warehouse: Divergence (1996-97)
Enterprise warehouse versus department marts. Relational versus dimensional. OLAP versus ROLAP. As if this were not enough, analysts started looking around at what people were actually building and concluded: none of the above. Many warehouse/marts violated one or more of the fundamental principles such as nonvolatility and point-in-time snapshots. What they were really doing was building a new technology form of externalized operational reporting. Some even allowed direct update that put them squarely back on the operational side of the equation.
To preserve the sanctity of the original definitions, many new labels were floated to describe these not- a-data-warehouse constructs. The most universally excepted term is the operational data store (ODS). An ODS may be built with the same underlying technologies, but differs from a data warehouse/data mart in that it may be real time (not snapshot), updateable (not volatile) or transactional in nature.
The introduction of the ODS concept was necessary to continue the dialog about what data warehousing is all about. Without it, more design failures would be blamed on the data warehouse concept. The short-term effect, however, was to introduce more chaos, increase the FUD factor and allow more people to justify the wrong approach. In fact, it almost caused the banishment of the data warehouse term all together.
In 1996, I had clients asking, "Do I need an ODS and a data warehouse and a data mart?" Though the answer should clearly be a resounding NO, I was surprised by the growing dissonance in the consulting community. I was shocked when a client in 1997 actually asked me to review a design that called for building all three simultaneously. There was literally no rationale for this design whatsoever other than a magazine article that showed an ODS, a data warehouse and a data mart in the same diagram.
More and more, I heard people say, "I need an ODS and a data mart but not a data warehouse." To them, an ODS meant detail and a data mart meant access and analysis. To them, an ODS uses a normalized design and a data mart uses an access-optimized (generally dimensional) design. Time and again, I resorted to pulling out 7- to 12-year-old monographs to demonstrate that what most of these clients were describing was the essence of original data warehouse models.
Data Warehouse: Synthesis (1998)
Fortunately, 1998 turned out to be a year of synthesis. The one-size-fits- all model of data warehousing is long dead. Happily, frameworks that look amazingly alike - once you remove the dissimilar labels - are replacing the chaos of competing approaches. The Inmon corporate information factory can be aligned with Kimball's new extended architecture in his advanced training. Proprietary approaches from divergent companies are tending toward a common norm.
What these frameworks have in common is the inclusion of multiple layers optimized for different tasks. Each has a layer for staging, a layer for historical detail and a multifaceted layer for optimized access and delivery of data. Kimball (begrudgingly) accepts the use of normalized design for staging or historical layers. Inmon acknowledges the need for dimensional constructs in the access layer. They generally allow for pass-through of transaction level detail and a downstream delivery of aggregated results.
Even the most rabid data mart fanatics have come around to the need for an enterprise solution. The evolution of independent data marts to dependent data marts with common source acquisition and reusable data objects is underway.
What is most heartening is the reconvergence of data warehousing with overall application architecture. Data warehousing will no longer be tacked on as an afterthought but part and parcel of the whole solution. The emergence of products such as SAP's Business Warehouse Solution can be seen on one level as just a marketing move to capture more of the IT dollar. What it indicates to me is the beginning of mainstream integration between the operational and analytic worlds.
Data Warehouse 2000: Real-Time Data Warehousing
Our next step in the data warehouse saga is to eliminate the snapshot concept and the batch ETL mentality that has dominated since the very beginning. The majority of our developmental dollars and a massive amount of processing time go into retrieving data from operational databases. What if we eliminated this whole write then detect then extract process? What if the data warehouse read the same data stream that courses into and between the operational system modules? What if data that was meaningful to the data warehouse environment was written by the operational system to a queue as it was created?
This is the beginning of a real- time data warehouse model. But, there is more.
What if we had a map for every operational instance that defined its initial transformation and home location in the data warehouse detail/history layer? What if we also had publish-and-subscribe rules that defined downstream demands for this instance in either raw form or as a part of some derivation or aggregation? What if this instance was propagated from its operational origin through its initial transformation then into the detail/history layer and to each of the recipient sites in parallel and in real time?
What would you say if I said you could do this today?
1. Kimball, Ralph. The Data Warehouse Toolkit : Practical Techniques for Building Dimensional Data Warehouses. John Wiley & Sons, February, 1996
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access