Continue in 2 seconds

The Data Warehouse Content Gap, Part 1

  • June 01 2007, 1:00am EDT
More in

There is a gaping hole in the current body of knowledge and practice within enterprise data warehousing (EDW). That hole manifests itself not merely in the EDW, but also, by extension, within the host of satellite systems that surround it (business intelligence, customer relationship management, supply chain management, corporate performance management, etc.) The modern-day EDW suffers from a content deficiency that renders the EDW something like a cracked slab, which no amount of downstream spackling and tape can remedy. What precisely is the nature of this content deficiency?

To answer that, let's start with a brief synopsis of the work of the current thought leaders in our field: EDW fathers Bill Inmon and Ralph Kimball have done an excellent job of taxonomizing the EDW's critical components and functions (thank you Mr. Inmon) and articulating schematic design principles that can be successfully implemented (thank you Mr. Kimball).1, 2 And so, we have plenty of guidance as to what comprises the EDW and how to effectively design and build one. The field is due for an infusion of new insight around the all-important question of why, and we in the industry need to rethink our answer to this question, because the answer to why informs the how of what we do.

Ask an EDW practitioner why a company should build an EDW, and you will most likely get some variation of the following reply, "To enable the business to make better-informed decisions." While this is not a bad answer, it is the answer of an engineer. Pose the same question to a businessperson and you will get a different answer, most likely something like, "To make more money." Why is this distinction important, and what does it reveal?

On the part of the information engineer, it reveals an insufficiently specific appreciation of the pragmatic goals and functional mechanics of the business that the engineer serves. Of course, the goal of the EDW is to help the business make better-informed decisions. That should be a given. But are all decisions equal? Furthermore, as engineers in the service of the business, let's camp out a bit on the businessman's statement of purpose, i.e., "To make more money." What should we as practitioners do with that? Given the stress and complexity of implementing an EDW, it is tempting to brush off the point - after all, everything about a business is (or should be) ultimately geared toward earning a profit. But if we tune out this point as excess noise owing simply to its comprehensive and ubiquitous nature, we miss the tremendous bottom-line value we can deliver by pondering the point and taking appropriate action. Perhaps this assertion is not excess noise, but in fact a governing design principle, and arguably the most important design principle for the EDW itself; that is to say, the ability to perform profitability analysis needs to be comprehensively, ubiquitously and schematically embedded in the EDW. In other words, the right response for information engineers is not, "So what?" but, "How can I make that happen?"

To make it happen, let's start by addressing the following: what do we have already, and what are we missing to get the job done? Those of us with some experience around various enterprise resource planning (ERP) systems know that these systems do a good job of linking together financial transactions (accounting records) with their associated operational transactions (purchase orders, deliveries, inventory movements, invoices, etc.) such that there is a join path back to the data points we need to determine the actual cost of acquiring a product from a supplier. The ERP may even do a good job capturing manufacturing and inventory holding costs. Furthermore, the ERP may drop these valuable bits of financial data right onto the operational transactions themselves, greatly simplifying our ability to do ubiquitous gross margin reporting within the EDW. We can build line-item fact tables right off our order, shipping and invoice records with the two perfectly additive measures we need (revenue and cost) handed to us directly out of the ERP with little to no transformation required.

But what the ERP does not give us out of the box is a good view into profitability at a net operating level vis-á-vis the customers, supplies and products or services for which the company is in business. We don't get a clean link to all of the sales or operating expenses (sales and service attention, marketing expenses, managerial and administrative costs, etc.) absorbed by the business's partners and products. These linkages are difficult to track, nonconducive to point-in-time transaction-based updates, inconsequential to the ERP's core objective of business process enablement and, therefore, not captured within the ERP. Nevertheless, they are important to understand because they typically represent a significant portion of a company's total cost of doing business, and therefore it is important to determine what responsibility various partners and products bear for driving those expenditures so as to determine their ultimate profitability and value to the enterprise.

How do we get there? How do we achieve the goal of ubiquitous net profit and performance analysis that is both the primary imperative and biggest deficiency in the modern EDW? A strategy for doing so will be systematically developed over several subsequent articles (Hint: We EDW practitioners need to learn our ABCs), but for the time being, while deferring on the schematics, let's discuss the key success criteria for such a solution.

Key Success Criteria

  1. The solution should provide performance data at the net operating level as already discussed.
  2. The solution should be both financially and operationally valid. The numbers tie back to corporate financials but also reflect the real drivers of corporate expense that an operational or line manager would say makes one partner or product more or less efficient than another.
  3. The solution should not merely capture those numbers, but should also expose the drivers behind them. It's not really sufficient or defensible to show that Customer A is a sugar daddy while Customer B is dead weight without providing the data that supports those conclusions.
  4. The solution should capture this data at the line-item grain. Dealing with data at the line item allows for aggregation along any dimension of interest (e.g., by customer, product, supplier or channel) or any combination of dimensions.
  5. The solution should provide a common framework and vocabulary for measuring performance across the enterprise. This is really part and parcel with numbers two and three, but is a significant enough point to bear separate mention. If the solution is valid across organizational silos and is also defensible, it should harmonize the organization's performance vocabulary, goals and metrics as a consequence. Furthermore, it is critical that the solution do so to be adopted and, more importantly, ensure bottom-line value capture for the enterprise.
  6. The solution should be fast, intuitive and easy to use. These are (or should be) general design goals of any data warehouse and, therefore, important to keep in mind.

By way of pragmatic observation and experience building, supporting and even decommissioning various decision support systems for several large enterprises, it is highly common for the EDW not to meet these goals. The EDW often tends to be an overly raw data dump with little of the tailoring and content enrichment required to make it truly valuable and easy to use. Where more tailoring and enrichment occur, it tends to happen in downstream subject-specific data marts or satellite systems that don't provide a 360-degree view of performance, or where numbers are engineered to meet the specific operational or political goals of that particular silo of the business. In many instances, the EDW becomes more of a hub for application data integration than a comprehensive enabler of managerial analysis and decision support - which is the EDW's highest purpose. Often, much more ends up going into the EDW (in terms of data, effort and cost) than coming out of it (in terms of information, usage and financial return).
But that is not the way things need to be! The implementation shortcomings or stunted evolutionary development of a particular architectural centerpiece need not mean that it is invalid, unimportant or should be abandoned or downgraded in favor of other approaches.

If the problems or the promise of a solution described herein resonate, stay tuned! This series will continue in DM Direct on June 8, June 15 and June 22. Go to if you are not already a subscriber. 


  1. Bill Inmon, Claudia Imhoff and Ryan Sousa. Corporate Information Factory, Second Edition. Hoboken, NJ: John Wiley & Sons, Inc., 2001.
  2. Ralph Kimball and Margy Ross. The Data Warehouse Toolkit. Hoboken, NJ: John Wiley & Sons, Inc., 2002.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access