We are constantly told that things are changing ever more rapidly. Before the latest book is dog-eared, authors of newer wisdom are wanting to declare old books road kill. The PL/1 I studied in school is just about worthless, but I don't see anybody writing that the "do-loop" is dead.
Walter Moeller is not chasing the latest new thing but writing about the underlying reality upon which business acts. The ideas Walter teaches and practices will be as true twenty years from now as they have ever been.
Of course, technology is only about five percent of our job. Ninety-five percent is dealing with each other. One thing I know has not changed about us is that we believe that which we want to believe. You may work with those who want to believe that they will someday be able to just wave their hand and create application masterpieces. They read the ads and skim the articles. I know they won't stop chasing the mechanical rabbit long enough to appreciate the road map to successful data warehouse development that Walter presents here.
Bob Schmidt consults for Stone Carlie and authors course work on data modeling. His CBT is distributed by IBM, Sybase and agpw. His book, Data Modeling for Information Professionals, was published in August, 1988, by Prentice Hall (ISBN 0-13- 080450-9).
The question rages: Now that we are doing data warehousing, can we dump our entity relationship diagramming (ERD) projects, tools and diagrams and just develop star-schema diagrams?
We hear it more and more from teams developing data warehouses or data marts.
Let me give you the answer first: The entity relationship diagram and the star-schema diagram do not compete with each other within the scope of a data warehouse project. There is a real and valuable purpose to be achieved by the use of each of these techniques. The challenge is to know how to develop each and what value to expect from each.
We still need to use entity relationship modeling data modeling, which is analysis to document the business information requirements for data warehouse projects. Bill Inmon calls these business information requirements the atomic level data for the data warehouse. The definition of this atomic level data is critical to the success of any data warehouse.
The well-known star schema proponent Ralph Kimball admits that "entity relationship (ER) modeling is a powerful technique for designing transaction processing systems in relational environments."1 The value of the entity relationship model is achieved in modeling enterprise data. Gaining consensus within the business as to the name, definition and business rules about the data within the enterprise is step one for anything you choose to do with that data whether that data is used by a transaction processing system or a data warehouse system.
The data warehouse development process includes many terms for the databases to be developed during the project. These may include information warehouse, info mart, operational data store, data warehouse and data mart. For the scope of this discussion, I will use data warehouse as the generic term and not get mired down in the subtle differences between each of these variations.
Proponents of star schema-centric data warehouse development recommend specifying fact tables and then specifying dimension tables. These are important and proper steps to be completed before you develop the physical database for a data warehouse. However, you must first have agreement on names, definitions and business rules of the data and understand the business rules used for aggregations and calculations before you can populate the star schema. The star schema structure in itself does not provide the necessary information about the atomic level data required for the fact tables within the star schema. This necessary information is the very foundation of the data warehouse. This business information is gathered through cross-functional information gathering sessions and can be recorded with the use of the ER diagram, the visual representation of your data modeling.2
We work with our clients to identify a team of cross-functional representatives who know their business and their business information requirements. We develop entity relationship models with this group. We do not impose the rigorous rules of ER modeling on these business people. We start by drawing a series of boxes and asking what "categories" of information we need to know in order to be effective with the business. We name the boxes (entities) based upon their input. If necessary, we will work with the team to develop correct entity names (singular noun or adjective noun form). We then develop definitions for each category (entity) so that all participants can agree with the business definitions. During this process, we often identify additional entities. For example, the sales force may talk about selling "products;" however, the manufacturing team may talk about making "parts." We must determine if these are synonyms. Should we record data about product, part or both?
Naming and defining the pieces of data recorded for each data category (that is, attributes) begins after we have a good start on identification and definition of entities. We do this by asking a series of questions, such as, "What information do you record about your customer?" We follow a similar technique in making sure that all cross- functional team members can agree upon the documented business data names and definitions. A similar approach is used to document the business data relationships on our ER diagram. We do differentiate between an ER diagram and an ER model. An ER diagram is the graphic notation of entities, attributes and relationships. An ER model is the ER diagram plus the associated "standards- compliant" names, enterprise acceptable definitions and appropriate business rules for the business data.
There is more to developing an entity relationship model than just drawing the diagram. If you have not had the opportunity to obtain education in ER modeling, we suggest you include a methodologist as part of your data warehouse team. The role of this person will be to take the initial input from the cross-functional team and document it in a rigorously correct ER model to properly convey the requirements to the technical team that will build the data warehouse database. (Actually, we suggest you get as many members of your organization as possible to learn data modeling how to think about data clearly.)
Some star schema proponents criticize the use of an ER model when working with business people. We do not try to work with an entire enterprise ER model when working with the cross- functional team. We work with individual views of data, such as the information needed to identify and record a new customer or the information required to record a new order. Some people may refer to this partial view as a data subject area, but to the cross-functional business team, it's just a part of their business. By building and validating the ER model in the "bite-by-bite" approach, the business people can focus on the business issues and not be bored to tears by the complexity of the ER notation. Data modeling is not about drawing pictures: it is about analyzing how the business understands and uses data.
When you have documented the (atomic) business information requirements that fulfill the scope of a designated project when you have done your data modeling you are now in a good position to begin documenting the requirements for your data warehouse. The primary critical success factor for a data warehouse is that it must be built to satisfy business requirements, using data defined by and understood by the business people who will use the information obtained from the data warehouse. If you are building a data warehouse only for technology reasons, there is a high probability the business partners may never understand its content or value. Only by making data modeling the foundation of your data warehouse (or any information system, for that matter) will you be able to build a business tool rather than a technology nightmare.
Traditional high-level tasks to be performed on a data warehouse project:
- Define business data requirements;
- Define system of record source;
- Define the data warehouse tool requirements and the vendors;
- Map system of record data to data warehouse database;
- Program the queries to support reporting requirements;
- Extract, scrub, aggregate and load the data warehouse;
- Initiate the load of historic transaction processing systems data;
- Initiate the process to support ongoing load and use of the data warehouse data.
(This is a generic task list; you should customize it as necessary.)
We propose that you can develop a project plan in which you effectively accomplish the objective of each data warehouse development task through the use of the correct diagramming technique and tools. See Figure 1.
|Task #||Task Name||Recommended Diagramming Technique|
|1.||Define business data requirements||Entity relationship diagram|
|2.||Define system of record sources||A matrix of source systems to target systems|
|3.||Define the DW tool requirements and the vendors||Identify DBMS, MDDBMS, data conversion and aggregation as well as reporting (OLAP) tools|
|4.||Design DW database||Star schema diagramming tools|
|5.||Map system of record data to DW database||Any of the "industrial strength" extract, transform and transport (ETT) tools|
|6.||Extract, scrub, aggregate and load the data warehouse||Any of the 'industrial strength" extract, transform and transport (ETT) tools|
|7.||Program the queries to support reporting requirements||A good OLAP engine|
|Figure 1: Recommended Diagramming Techniques|
Ralph Kimball has developed a systematic set of nine steps to accomplish the fourth task designing the data warehouse database:3
- Choose the process;
- Choose the grain;
- Identify and conform the dimensions;
- Choose the facts;
- Store recalculation in the fact table;
- Round out the dimension tables;
- Choose the duration of the database;
- Track slowly changing dimensions; and
- Decide the query priorities and query modes.
An entity relationship model will assist you in the development of a quality star schema diagram and provide value in completion of at least task numbers four, five and six by insuring well-documented names, definitions and business rules for the data you put into the data warehouse and provide to your business.
The entity relationship model and the star schema diagram are both valuable and when used together will enhance your ability to develop data warehouses that meet information requirements of the business. These two techniques should not be viewed as competing methods; they are companions that each serve a role in the development of data warehouses to support your organization.
2 For a more detailed understanding of the use of facilitation as an information-gathering technique, see my article, "Information Gathering in Groups" in Handbook of Data Management, Barbara von Halle and David Kull, Editors. Pages 535-546. Auerbach Publications, 1993.
3 DBMS. December 1996 and January 1997. "Letting the Users Sleep, Nine Decisions in the Design of a Data Warehouse." If you are interested in learning how to do these individual tasks, contact Ralph Kimball through http://www.rkimball.com.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access