Claudia would like to thank John Ladley for his assistance in writing this month's column.

In Part 1 of this column, we covered the need for multiple database designs for implementing data warehouse environments (data marts, data warehouse, ODS, etc.). Focusing on a single design structure is an approach that frequently decreases the value derived from the information asset due to reduced usage and increased costs of ownership. Star, snowflake and third normal form schemas as well as denormalized flat files should be considered as structural options.

With that in mind, this column addresses when and where to use the various schema structures and provides guidelines for mapping them to the business requirements. It includes several activities that you may find useful for selecting the database design that best fits the business problem at hand.

  1. Start with the business process. During requirements definition, avoid the tendency to jump ahead and start to create the star, third normal form, denormalized or other structure that may seem evident. In truth, there will be multiple schemas throughout the data warehouse/data mart environment. Staging areas, ODSs, atomic DW, federated, etc., are all architectural components. The makeup of these components and their schemas must be based on performance criteria and business process requirements. Once the business processes are identified, the key business goals or drivers (pain) must be documented. The objectives for resolving these issues must also be documented. Objectives are the quantifiable, measurable results of achieving goals. Each of the business processes associated with the business drivers (opportunity or pain) can then supply the goals and objectives that must be accomplished to address the opportunities.

    Lastly, develop some scenarios, or conceptual prototypes, of how end users will react, assuming they have access to information to fulfill the objectives. Again, the business processes that have been identified participate in this process. This provides a baseline model for defining the presentation of the information and subsequent detailing of the requirements.

    Performance issues and end-user skills greatly impact the ultimate choice of database design. Figure 1 should help you in terms of determining which type of business intelligence community member you have for an end user.1

BI Community Form of Analysis and Database Schema Preferred
Farmers – Monitor the effects of tracking key performance metrics
  • Predominantly use multidimensional data marts
  • Star, snowflake, hypercube schemas
Explorers – Endeavor to understand how the business works by finding hidden meanings in data
  • May start with multidimensional data mart but soon require their own environment
  • Star, snowflake, hypercube
  • Exploration technology
Miners – Scan large amounts of detailed data to confirm a hypothesis of suspected patterns
  • May begin with multidimensional data mart but soon require their own environment
  • Star, snowflake, hypercube
  • Flat files, data sets, statistical samples
Operators – Use the intelligence derived by Explorers and Farmers to improve business conditions
  • Predominantly use operational systems or operational data store
  • Predominantly third normal form schema for ODS
Tourists – Are aware of data produced by the business
  • Predominantly use multidimensional data marts or informal warehouse
  • Star, snowflake, hypercube

Figure 1: Database Designs for End-User Communities

  1. Describe the measures. Next, the objectives from the first step are broken down into measures. Facilitated sessions, interviews, study of existing systems or reports are all required to obtain business-level measurement definitions. In turn, the measures provide the baseline for a list of information requirements. Note that to determine a measurement, there must be an adequate level of detail (grain) for the appropriate dimensions and the numerical facts to be analyzed or presented in the data marts.

    To determine the grain, and eventually the physical schema, attribute the measures and information requirements according to Figure 2.

Requirements A list of information requirements based on measures, as would be manifested in decisional reports, operational reports
Periodicity What is the time frame the requirement represents - ad hoc, real time, daily, monthly, etc.?
Time Frames How often does the report come out - hourly, daily, etc.?
Latency How soon must the information delivered?
Volatility How often does the data appearing in the requirement change?
Range What type of historical period is required to store this information?
Subject Areas What subject areas are contained in the information requirement?
Functions What business function does the requirement support?
BI Community Member Within the business functions, what types of BI users will there be?
Sources What are the data sources from which the requirement might be met?
Appearance What format is required?
Distribution How far and where does the information get distributed?

Figure 2: Measures and Information Requirements4

Attributing the measurements results in the information required to accurately identify the grain. Many projects have grossly underestimated grain by assuming, or guessing, what the grain should be ­ the authors strongly recommend usage of a formal process for determining grain. (Good references for this process are Ralph Kimball's books.)3

The completion of the measurement attributes provides the raw data used to determine information requirements needed for the ODS, data mart, central DW, etc., just as the volatility and latency are used to determine which of these structures is more appropriate.

An important caveat here is that the data warehouse (or other central structure), which feeds the marts will be required to support any and all forms of analysis ­ not just multidimensional forms. The data warehouse environment must align varying skill sets, functionality and technologies. Therefore, it must be designed with two ideas in mind. First, it must be at the proper level of grain to satisfy all the data marts. That is, it must contain the least common denominator of detailed data to supply aggregated, summarized marts as well as transaction level exploration and mining warehouses.

Second, its design must not compromise the ability to use the various technologies. The design must accommodate multidimensional marts as well as unbiased mining and exploration warehouses. Therefore, data warehouse designs are not necessarily star schemas ­ some are normalized and others are denormalized to varying degrees.

  1. Select the structure for storing the answer data. Finally, plot the requirement against some type of matrix, similar to the one in Figure 3 or one of your own making. Such a matrix allows DBAs to view where the overall requirements lie in terms of the physical database drivers, i.e., volatility, latency, multiple subject areas, etc., and the analytical vehicle that will supply the information (via the scenarios that were developed), e.g., repetitive delivery, ad hoc reports, production reports, algorithmic analysis, etc.
    Figure 3: Requirements in Terms of Database Drivers

    While the tremendous success of any single architectural component can lead to universal use, unrestrained implementation can eclipse or replace other valid components, thereby weakening the architecture. In contrast, identifying the merits of each new component and integrating these into the architecture strengthens the overall architecture. That is, it can now satisfy a more diverse range of requirements.

References

  1. See DM Review, July/August 1999 issue for five detailed articles on the Business Intelligence Communities, their skills, roles, functions, etc.
  2. See DM Review, February 2000 article entitled "Exploration Warehouse ­ the Final Frontier" by Claudia Imhoff for overview of these technologies.
  3. Kimball Ralph. The Data Warehouse Toolkit. John Wiley & Sons. 1998. The Data Webhouse Toolkit. John Wiley & Sons. 2000.
  4. Ladley, John. Knowledge Logistics ­ Pragmatic Approaches for Business Technology. To be published late 2000.

John Ladley is the president of Knowledge InterSpace, Inc. and is a META Group research fellow. He can be reached at jladley@knowledge interspace.com.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access