This month's column will show the applicability of the Zachman Framework to assist in source system analysis which is critical to data warehouse development.

Often, documentation of the source systems is sparse and out-of-date. Further, the only "model" that is available is the physical schema of the database. The first step in applying the Zachman Framework to the problems is to recognize what is--and what is not--available.

Prior to embarking on the source system analysis, the information requirements need to be determined and the data warehouse model needs to be developed. Information requirements can be identified in a variety of ways, all of which include involvement of the business community. Once these needs are delineated, the data warehouse model can be developed.

The data warehouse model describes the target environment. In developing the data warehouse model, the project team either began with the business (operational) model or developed those aspects of the business model that were needed. The data warehouse model was then developed by transforming the operational model into one needed to define an atomic data warehouse, with a focus on the elements needed to satisfy the information requirements. While there may be some tuning based on the source system analysis results, the majority of this activity should be performed prior to evaluating the source systems. The reason for this sequence is that the business model needs to describe the business, and the data warehouse model needs to be based on the business model and on the information requirements.

The data warehouse model is considered to be a system model. It contains the electronic representation of the items of interest for a particular application. This is similar to the system model for an operational system. Both are limited to the subset of the enterprise needed for a particular problem set, and both are semantically consistent with the business data model.

Armed with the information requirements and the data warehouse model, the search for the systems of record can begin. As a general rule, data for the warehouse should be captured as close to the source as possible.

If the source system has a database, the physical schema of the source system exists. Since this system supports a business process, it should be consistent with the business model--but is it?

Using the model relationships portrayed by the Zachman Framework, the source system analysis can focus on answering the previously posed question: Is the physical schema of the proposed system(s) of record consistent with the business model? If not, what are the differences?

Before these questions can be answered, the intervening models must be created. For the proposed system of record, this means developing the technology model and the system model, at least for the subset of the system that is being considered as a source for the data warehouse.

Development of the technology model can be partially automated using modern CASE tools by reverse engineering physical schema. The technology model includes the named databases, tables, keys, indexes, relationships, etc., for the implemented system. When generated through reverse engineering, it does not include the definitions of the fields nor any business rules that led to the relationships. For a complete technology model, this information needs to be added manually. In all likelihood, the technology model won't be complete even with the manual intervention, since there are undocumented assumptions concerning these definitions and business rules.

A major distinguishing factor between the technology model and the system model is that the former is an implementation of the latter, subject to constraints imposed by the platform (hardware, operating system, database management system, etc.) being used. To develop the system model, the constraints that were imposed need to be removed. As with development of the technology model, the system model will not be complete or accurate, since some, if not all, of the decisions made in implementing the technological constraints are undocumented.

Once the system model is produced, it can be compared to the business model and inconsistencies can be identified. In some cases the inconsistencies can be used to correct the system model. In other cases, the inconsistencies are caused by differences between the business as described by the business model and the business as supported by the operational system. This latter set of inconsistencies is the one that needs to be resolved by a data acquisition process that includes data collection, integration, transformation, cleansing and loading. The model differences sometimes identify inconsistencies that cannot be resolved through the data acquisition process. When this occurs, the data warehouse team and business users may need to look for other sources or adjust the information requirements to reflect the models of the operational environment, at least for the iteration being developed.

The business value of the data warehouse is dependent on its ability to satisfy the strategic decision support requirements of the enterprise. Other techniques that apply the Framework exist, and at the heart of all of them is the assurance of the semantic consistency of the various models.

(See Building the Data Warehouse by W. H. Inmon for additional information concerning the data warehouse and data model creation process.)

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access