When asked, "Is the data model complete?" the typical data modeler will answer no. While this may be true, a more appropriate question is whether or not it is ready to be used. To answer that question, it is helpful to look at the model purpose, scope and components.
In my June column, I provided some of the reasons for developing the business data model. These included helping with scope definition, supporting data profiling efforts, supporting the transformation and integration process and being the basis for physical database design. None of these applications dictate that the model must be complete.
A complete data model is represented in (at least) third normal form, includes all the entities, attributes and relationships for the enterprise, with all entities and attributes defined and all relationships described. Further, for the data model to be complete, the information within the model must be verified to be reflective of the business.
The scope of the model is the first item that needs to be tackled. Virtually no companies today build a data model just to have one. Data models are built in conjunction with other efforts, such as system development projects or business intelligence programs. In both cases, the scope of the model should be limited to the data that needs to be addressed by the initiative. Developing or refining data model components beyond that which is needed to support the initiative is interesting, but doing so provides no business value (until a subsequent project needs that portion of the model).
A best practice is to develop the data model incrementally based on the scope of the underlying projects that will use it. Occasionally, it is a good idea to go slightly beyond the project scope, but only at the entity level. This extension provides an understanding of the immediately surrounding data relationships. If these relationships include foreign keys that impact the data within scope, the extension could minimize rework when the model is eventually expanded.
The data model should include all the entities within the scope of the effort, and each of these entities should have an agreed-upon definition. Determining if all the entities are captured is not very easy. Information for the model is gathered through interviews, documentation reviews and examinations of existing physical data structures, and these provide a good foundation. But if the people being interviewed didn't think of a data entity, and if that entity was not represented within an existing database, the modeler could miss it. The likelihood of this happening is slim, but the possibility always exists.
The more perplexing problem with entities deals with the definition. A good data modeler tries to formulate a business-oriented definition for each entity. That process requires business involvement and often requires resolution of differences among business representatives.
To facilitate project progress, the data modeler needs to evaluate each definition to determine if it's close enough. Differences of opinion need to be resolved eventually but do not impact usage of the model within the project. When definitional differences do not impact use of the model for development activities, the data modeler needs to release the model and continue working to resolve the definitional issues on a parallel track.
The data model should include all the attributes within the scope of the effort, and each of these attributes should have an agreed-upon definition. Like the entities, some attributes may not be identified initially. Fortunately, the impact of adding attributes is often minimal, and the modeler needs to be willing to release the model after a reasonable amount of due diligence.
The attribute definitions may have the same issues as the entity definitions, but resolution of the differences for these is more crucial. The reason is that the data resides within the defined attribute, and the definition of that attribute drives the data profiling as well as the data transformation and integration logic. If the differences involve wordsmithing, they can wait; content differences, however, must be resolved to enable the development team to move forward with its activities.
The relationships portray business rules between pairs of entities. These rules impact hierarchy definitions and manifest themselves in the form of foreign keys within entities.
Hierarchies are often eventually transformed into (conformed) dimensions that are used by businesspeople throughout the organization. In developing the hierarchical relationships, it is critical to determine whether or not exceptions to the traditional hierarchical relationships exist. The data modeler should be very careful not to consider the model complete until all the hierarchical relationships within the scope are defined and verified.
Relationships also result in the generation of foreign keys between parent and child entities. Unlike other attributes, which are relatively easy to add, addition of foreign keys to an established database is not simple. Hence, time should be taken to verify the parent-child relationships within scope.
True, a data model is never complete, but that should not deter using it to support development projects. The data model should be developed incrementally, and within each increment, the data modeler needs to understand which actions are critical prior to release of the model to the development team and which ones can be addressed while the development team is using the data model.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access