Continue in 2 seconds

It's always scary to trot out one's core beliefs in front of the pros, but I need to hear your constructive criticism.

  • Larissa Moss, Clay Rehm, Les Barbusinski, Adrienne Tannenbaum
  • June 10 2004, 1:00am EDT


It's always scary to trot out one's core beliefs in front of the pros, but I need to hear your constructive criticism. The following are my views, and then a response from a colleague who believes in a very different approach. Who is right?

My views:

If you decide that you want to build an enterprise-wide analytical system, then the foundation of it must be a common data model with consistent logical model definitions and from which views can be built that leverage that data model consistency. Your target audience is that set of enterprise managers who wish to view the enterprise as a whole.

To achieve this, you do your best job to generate a robust enough data model that covers all the data sources and pieces of data of interest in those data sources that matter to the enterprise. When a new source system comes along, and you find interesting nuggets of data in it that are unique to that source system, you absolutely consider extending the warehouse data model to include those. However, you keep in mind that you are extending the federated data warehouse model as a whole, and your data model analysis considers carefully what meaningful default values should be for these fields that the other N source systems feeding your warehouse do NOT supply.

Finally, as you populate the warehouse from these various sources, assuming they do not share a single operational data model, you may need to transform incoming data so that it is consistently represented in the warehouse. The "T" in ETL stands for Transform, and exists as an integral component of an enterprise-wide warehouse architecture precisely for this reason.

If you seek to satisfy the verbatim reporting needs of each individual department in isolation as your top business priority, you reach a data integrity impasse when you combine data that has subtle or not-so-subtle inconsistencies in it into a single view. You may call that a data warehouse on your marketing materials, but it fails a basic litmus test of a well-designed warehouse architecture. You are in fact creating stovepipe analytical applications and just happen to be storing all their data in one physical repository.

To accommodate both the department managers who want to view their own world in isolation AS WELL AS the enterprise-wide management, you do a couple of things. If there are qualitative attributes (coded lists of statuses, for example) coming from one source system that roll up neatly or align one for one with equivalent warehouse statuses, then it's perfectly OK to consider including that source system qualitative attribute as a field in the warehouse model. In addition, you invest a significant effort in designing metadata that fully paint the picture of how that department manager's portion of the world fits into the whole enterprise ... how their data was carried over verbatim or transformed or rolled up or renamed or whatever.

This is in keeping with the way in which a well-run business's senior management wants their business unit managers to think ... how does what I do fit into the whole.

... and my colleague's response:

No you don't. I've worked on enterprise-wide MIS teams. You receive all the data in its original format into your data warehouse. If you alter it, you alter it in your mart. Your DW and your DM are not identical. I've worked in this manner and researched it as well. You never entirely relinquish the original meaning of the data point.


Larissa Moss' Answer: You may receive all the data in its original format into your staging area (my assumption is that what you are calling a DW is not your staging area), which will have an area set aside for data consolidation as its first ETL step. Consolidation and integration are not the same thing. Consolidation means you copy some or all of the data "as is" into one place; I call that "suck and plunk" (cut and paste). Integration means that you rationalize the data (remove duplicates), filter the data, merge the data, separate the data, cleanse the data, enrich the data (e.g., change codes into mnemonics), and most importantly, relate the data. There are two ways of relating data; one way is to show the integrated cross-organizational business relationships as they exist in the real (operational) world, and the second way is to show the reporting pattern relationships between the facts and their (conformed!) dimensions for each of those reporting patterns. While it is true that an EDW and its spun-off DMs are not identical in design or content, integration is still a key concept for all DW databases (ODS, EDW and DM). The only exception might be some regulatory or policy reasons that do not allow you to change some data values in an ODS, if the ODS is used for regulatory or policy type of reporting. One other point to remember about data warehousing: The DW concept was born almost 20 years ago with the goal and objective to bring order into existing data chaos by analyzing and eliminating (or at least reducing) data redundancy, data inconsistency, data domain violations and data integrity violations. The resulting data reorganization and data transformations were meant to enable fast, easy and reliable reporting of a company's horizontal (cross-organizational) view of its information (data plus meta data, which is also part of a DW environment). Any action that contradicts this original objective should be considered harmful to any DW initiative.

Adrienne Tannebaum's Answer: Quite a speech! Well, there is actually some truth and validity to what you as well as your colleague have to say. First, it is always necessary to understand how the data in an organization connects in order for business functions to be successful. This type of analysis is usually called an "enterprise data model" and it will provide the foundation for how all of the "also known as" and other versions can be logically connected.

You will never achieve data utopia in your organization, so many may not want to hear about the "ideals." It is always easier to take what data you have, use it and then worry later about why it is different from everything else.

The best compromise is done with a flexible meta model. In here we track "what should be" (your enterprise model utopia) and relate it to "what is." Granted, I am simplifying the meta model, but if you both think about what is vs. what could be vs. what should be, you will be working as a team.

Les Barbusinski's Answer: You're both right, and you're both wrong. Bear with me while I build my case. A data warehouse consists of several data layers (a.k.a. data stores). Each data store has a different function, works with a different set of data and stores its information in a different type of data structure. Most data warehouses consist of six data stores:

  • Staging Area (flat files or relational tables)
  • Historical Data Store (relational tables in third normal form)
  • Operational Data Store (denormalized tables structured for operational reports)
  • Analytical Data Store (a.k.a. dimensional data store or "DDS")
  • Extract Area (flat files or tables for feeds to data mining and/or external systems)
  • Archive Area (CD jukebox or tape farm containing expired/archived data from DW)

The three most important data stores are the HDS, ODS and DDS.
The purpose of the Historical Data Store is to hold cleansed and integrated historical data (both transactional and "master" data) for the enterprise in a fully normalized (i.e., Third NF) data structure. It acts as a comprehensive information repository for both the ODS and DDS ... since those two data stores only store the data they need to generate their reports. (NOTE: By "master" data I mean "master file" type data such as parts, customers, stores, machine centers, etc. ... data which eventually may become a "dimension" in the DDS).

The purpose of the Operational Data Store is to hold current (not historical) operational information in a de-normalized structure (i.e. master and transactional data "mish-mashed" together) that facilitates the efficient generation of operational reports (e.g., lists, exception reports, summary and detail reports, statements, etc.). This means that the ODS is populated by tables that look a lot like the reports they facilitate, and the same data may appear in multiple tables. Also, data that is not needed for operational reporting is not stored in the ODS.

The purpose of the Analytical Data Store is to hold current and historical information about the enterprise in a dimensional structure that facilitates the generation of analytical reports (i.e. OLAP/BI reports with atomic and aggregated metrics measured against groups of dimensions which permit drill downs and other manipulations that help answer questions about the enterprise, and facilitate decision making). As with the ODS, the DDS does not store data it does not need for its analytical reports.

Data going into the HDS must be cleansed, synchronized, integrated, and transformed into a cohesive structure that accurately depicts the enterprise as a whole. Storing raw data from the operational systems in the HDS makes no sense because that would mean that the data would have to be cleansed, synchronized, and integrated multiple times each night (i.e. once for each data mart and/or ODS). In the HDS, consistency is the key.

On the other hand, the data going into the ODS and/or the DDS must a) reflect the perspective of the end user organization it serves and b) be structured to facilitate the generation of the reports it produces. For example, let's take look at sales transactions as viewed by two different organizations: sales and payroll. The sales department wants to see the revenue generated by the sales rolled up to the current (i.e., "as is") sales hierarchy (i.e., the relationship between offices, districts and regions as they exist today). The payroll department, however, wants to see the commissions that are owed for those very same sales but rolled up to the historical (i.e., "as was") sales hierarchy (i.e., the relationship between offices, districts and regions as they existed at the time of the sale). Same sale ... two different perspectives. So, your arguments are correct for the HDS, while your buddy's arguments are correct for the ODS and DDS. Don't you just love happy endings?

Clay Rehm's Answer: I agree with your assessment and think it is very well written and thought out. The problem with your assessment is that it is great in theory but will have trouble in real life. I have encountered many enterprise managers who will not agree with each other on common ideas since they have agendas of their own. Each manager is being paid and judged by profit margins, efficiency and the bottom line - not on how well they play with others.

I also agree with your colleague's response.

You should strive for a common set of data for the organization but do not get discouraged if you are not able to accomplish 100 percent of your goal! If you can get some people in different departments within the organization to use common semantics, data and processes, you have accomplished your goal.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access