Continue in 2 seconds

Can a single data staging area serve two masters, operational (ODS) and analytic (data warehouse)?

By
  • Joe Oates, Chuck Kelley, Clay Rehm, Scott Howard, Steve Hoberman
Published
  • December 09 2003, 1:00am EST

Q:

Can a single data staging area serve two masters, operational (ODS) and analytic (data warehouse)? Is a single data staging area serving both requirements (operational and analytic) common in the data warehouse industry?

A:

Steve Hoberman’s Answer: This is a great topic for a more detailed discussion but I think to squeeze it down to a single paragraph. If you answer "yes" to these two questions, you can use the same staging area for both the ODS and DW:

  1. Is the currency requirements the same for both ODS and DW? That is, do they need the same freshness level of data (minute, hourly, daily, etc.)?
  2. Is the level of granularity the same for both ODS and DW? The ODS generally needs more detailed data but if you can store the lowest level possible in your stage you’d be in good shape.

I am making also a few assumptions about your definition of stage, ODS and DW. I am assuming your stage is somewhat integrated and you are assigning surrogate keys here. The ODS is very detailed and normalized and structured to answer tactical questions, and your DW is more strategic and therefore summarized and a flattened reporting view of the world. Good luck!

Chuck Kelley’s Answer: This, of course, will depend on your definition of a staging area. I use staging areas to feed the ODS and the data warehouse. This is because I use the staging area to feed the part of the data warehouse that isn’t fed by the ODS. Therefore, the source systems feed the staging area which feeds the ODS. The data warehouse is fed by the ODS and the staging area. I think it is very common in the data warehouse industry.

Joe Oates’ Answer: In my experience, the answer is no. First, let’s review what the data staging area is.

The data staging area is a system that stands between the legacy systems and the analytics system, usually a data warehouse and sometimes an ODS. The data staging area is considered the "back room" portion of the data warehouse environment. The data staging area is where the extract, transform and load (ETL) takes place and is out of bounds for end users. Some of the functions of the data staging area include:

  • Extracting data from multiple legacy systems
  • Cleansing the data, usually with a specialized tool
  • Integrating data from multiple legacy systems into a single data warehouse
  • Transforming legacy system keys into data warehouse keys, usually surrogate keys
  • Transforming disparate codes for gender, marital status, etc., into the data warehouse standard
  • Transforming the heterogeneous legacy data structures to the data warehouse data structures
  • Loading the various data warehouse tables via automated jobs in a particular sequence through the bulk loader provided with the data warehouse database or a third-party bulk loader

Next let’s review what an ODS is. It should be noted that there are different kinds of ODSs. Bill Inmon describes three different classes of ODS. One of the identifying features of an ODS, according to Inmon, is that it only holds one- to three-months' worth of historical data, not years of historical data like a data warehouse.

Class I is the "real-time" ODS. In this environment, the updates to the legacy system are reflected in the ODS in just a few seconds. Due to the speed of updates to the ODS, there is not time to do much, if any, transformation of the legacy system data into any other format. Due to this time restriction, a Class I ODS is usually for a single legacy system as opposed to an integration of multiple systems. It is, in effect, a copy from one server to another so that reporting won’t adversely affect the legacy system performance and response times. A Class I ODS is rare today.

A Class II ODS is defined by being updated intraday every one to three hours. This technique is sometimes referred to as "trickle feeding." In this environment there is time to do some transformation from the legacy system(s) design to a design that provides more efficient reporting. There is also time to do some integration of multiple legacy systems. Class II ODS’ often look like star schema data marts and can easily support presentation tools such as Business Objects, Cognos, etc. I have designed and implemented a couple of Class II ODSs that worked quite well.

A Class III ODS is usually updated once a day, usually at night, after the legacy systems have closed down. This is by far the most common that I have seen. Like the Class II, there is time to do transformation into more efficient reporting design. Another difference between an ODS and a data warehouse is that if a balance or other data item changes many times a day, each will be reflected in the ODS, whereas most data warehouse implementations will only store the single balance as of the extract time.

Even though a data staging area cannot serve "double duty" as a data staging area and ODS, an ODS can serve as a kind of staging area for a data warehouse. An ODS may not convert legacy system keys to surrogate keys like a data warehouse. In any case, there is an ETL process between the ODS and data warehouse.

There are other uses for an ODS, but I hope that this answers your question.

Clay Rehm’s Answer: I don't see why not. I am not sure if it is common in the industry, but why create more work for yourselves if you do not need to? I assume your staging area is clear on what each source, mapping and target is and it is well documented (meta data).

Scott Howard’s Answer: By the nature of your question, I have to assume that you have implemented a federated warehouse, that is one where your enterprise warehouse is not centralized, but distributed among a network of semi-independent data marts. I usually caution against such an approach mostly due to the dilemma that forces your question – that is the absence of a detailed and persistent data store. This needed persistent and detailed store is usually replaced by a temporary staging area. If my assumption is wrong, please provide more details and I’ll revise my response. In any event, I only recommend a federated warehouse architecture when strongly opinionated and diverse user communities make a true reconciled and consolidated enterprise data warehouse too challenging. This typically occurs in large insurance companies which also leads me to my assumption.

If this assumption is correct your staging area is similar to one that Ralph Kimball describes in his book The Data Warehouse Toolkit, Second Edition. This non-persistent staging area caches operational data to be later populated to the warehouse buss. Given that, let’s explore the difference between the ODS and analytical DW or independent data mart.

The ODS is a low latency data structure appropriate for real-time or near real-time operational process support and for tactical decision support. It is usually non-persistent storing only near- term cleansed and consolidated data for short periods of time. This is opposed to the definition of the enterprise data warehouse which is a persistent data store housing only reconciled, consolidated and cleansed data appropriate to source a data mart or select enterprise scope or individually unique strategic decision support requests. The data marts, in turn, are also optimized for the individual decision support uses of a department or business community and optimized for their query tools. In a federated warehouse that I suspect you have, your data warehouse takes the form of both the data warehouse and data mart as I have just described each. No enterprise data warehouse would exist as its function is provided by the data warehouse bus. I’ve found the federated data warehouse architecture initially attractive, but difficult to maintain because the meta data-based DW bus breaks down over time.

Given these industry-accepted definitions, yes the staging area can serve both the ODS and the data warehouse. However, allow me to propose an alternative. Allow the ODS to act as the staging area for the data warehouse, at least for the data that it maintains. The data in the ODS is more real time than the data required by the data warehouse. Reconcile and cleanse that data, populating the ODS near real time providing value to the operational and tactical decision makers who need it and also use it to populate the data warehouse, honoring the less demanding load cycles typical of the DW. You will still need the staging area for DW required data not hosted in the ODS, but in following this recommendation you economize on your ETL flows and staging area volumes without sacrificing value or function.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access