In the chaos of day-to-day business, sometimes it's hard to see the forest for the trees. The initial design of an enterprise data warehouse that is supposed to provide strategic information to key analytical thinkers can get bogged down by the overwhelming demand for near real-time information. You spend years designing a rocket ship, and the first thing people want to do is slap roller skates on it and ride it around the parking lot. Your vision of a data warehouse where virtually any information demand can be fulfilled effortlessly is down the drain. Or is it? First we should define the playing field. An enterprise data warehouse (EDW) stores integrated, aggregated information that covers many subject areas and retains months or years of data history. The time span is needed to determine and analyze trends over time. A data mart (DM) is similar in design to a data warehouse, but for a limited subject area(s). An operational data store (ODS) is specifically designed to analyze near real-time data now--to solve day-to-day problems. The content may span many subject areas, but little history is retained because of the temporal nature of the information.

Some examples of ODS data needs within the insurance industry are analyzing the claims backlog by department (to reassign claims processors to overloaded departments), the total accounts payable for the next check run (to move money to the right bank accounts and play the interest float) and what payments have been received in the last 30 days (with check number) for customer service inquiries. Why can't people get near real-time data from the on-line transaction processing (OLTP) system? Does this kind of information belong in an on-line analytical processing (OLAP) data warehouse? The answers are: 1. It's too hard, and 2. Yes (as long as we plan ahead of time). Let's explore why.

Figure 1: Data Warehouse Architecture

OLTP systems are designed for speed of input for predefined transactions. They are not designed for the ease of access to data that doesn't fit the predefined transaction model. Conversely, OLAP systems are designed specifically to address data access when you don't know the access path in advance. So trying to make an OLTP system provide flexible data analysis is like trying to get a fish to ride a bicycle. Maybe you could eventually, but what's the point?

The intelligent design of an data warehouse should take into account that current OLTP systems do not meet reporting expectations, and the advent of a good data warehouse will attract many data thirsty friends. We will then need a seamless data warehouse environment that can integrate the top down aggregate information that spans the enterprise (an executive information system--EIS), the drill-down capability of a decision support system (DSS) and the ad hoc query flexibility of data mining. See Figure 1.

To balance the disparate types of source data for the data warehouse, we'll first need a data transformation layer that can cleanse and restructure the data for the warehouse. Behind the scenes, it may be necessary to "stage" the data acquisition first to the ODS, then to the DM and then to the EDW. Within this environment, the temporal nature of near real-time data for the ODS can be managed, with all the analytical power of the data warehouse. For handling data access, a meta data layer will provide navigation and access to the appropriate data store. The external interface for business users through the meta data should be from a browser application to allow a "zero administration client" for the data warehouse and a fast learning curve for the business applications. In this way, the design and flexibility of each unique business data need can be met, providing maximum business value to the organization.

Orchestrating this symphony requires advance planning and a disciplined implementation, but to ignore the data crisis du jour will eventually cause the compromise of the EDW anyway. If we can't navigate around the trees, we can't find our way out of the forest.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access