The data model is a valuable enterprise tool. In addition to helping enterprises understand information assets and analyze data requirements, data models support decision-making, enable information sharing, guide data quality initiatives and form the cornerstones of an enterprises data architecture. In fact, data models are the Holy Grail for the business and technical community.
A quality data architecture is the soul of an organization and must clearly depict how information flows through the enterprise and how it is used. The key component of a data architecture is the data model, which is typically a graphical representation of data structure.
Thousands of data models are being built these days. But for every good data model, there are a dozen bad ones. Existing efforts focus on completing the task at hand building an adequate model rather than spending time validating the effort. Imagine a mission-critical system deployed without testing. If we agree that a robust data architecture developed and maintained through data modeling efforts is the backbone of the information enterprise, then why arent we spending 25 to 30 percent of our effort in a quality assurance review? The impact of forgoing this step is grave.
What Constitutes a Quality Data Model Quality?
Generally, quality is addressed relative to fitness for purpose. For quality data models, we talk about two concepts, completeness and correctness, from two perspectives, conceptual and technical. We document data models using what is essentially a language. Using its syntax correctly and making sure the model is an accurate and complete representation of the concept we are representing yields a quality result. These concepts and perspectives shape our quality framework:
- The conceptual perspective pertains to context or the meaning of the representation of the architecture relative to the organization.
- The technical perspective pertains to the adherence to syntax or internal integrity of the representation.
- The completeness concept is one of wholeness or comprehensive coverage.
- The correctness concept is one of accuracy or correctness.
To ensure a data model is built right the first time, we have taken basic ideas about languages, syntax, semantics, models and quality and have joined them to form the five dimensions of data model quality that are depicted in Figure 1. Together these provide a framework for a comprehensive modeling approach.
Figure 1: The Five Dimensions
The Five Dimensions Defined
Conceptual correctness implies that the data architecture accurately reflects the business objects of interest for the enterprise and requires that the data structure needed to support all business processing is in place. Achieving conceptual correctness depends on the ability to translate information of interest in the business environment into a structured representation using a semantic language that forms a meaningful and accurate representation of the real world. Determining conceptual correctness is one of the most difficult aspects of assessing overall quality and the most challenging aspect of building a data model in the first place.
Conceptual completeness implies that the data model contains objects adequate to describe the full scope of the business domain that the model purports to represent. Our ability to judge the quality of a data model is closely tied to outside factors such as government and legal mandates, financial constraints and stakeholder requirements. You cant build representative data models by focusing exclusively on business data. You must take into account all of these factors to understand data and its interrelationships, or you will never build good data models.
Technical correctness implies that the objects contained in the data model do not violate any of the established syntax rules of the given language. Syntactic correctness means data model boxes, lines and symbols are used for their intended purposes and that the model adheres to generally accepted practices of the chosen methodology. Once rules are established they become part of the syntax and the models must be judged against those rules.
Technical completeness implies that all the requisite data model objects, components, elements and details are captured at appropriate levels of detail for the purpose of the data architecture. As an example, assume we have adopted IDEF1X, a modeling notation used extensively by the federal government and by some commercial organizations as our modeling technique. In IDEF1X, data models are supposed to be built in three distinct phases: an entity relationship modeling phase, followed by a key based modeling phase and finished with a fully attributed modeling phase (NIST 1992). As the name implies, non-key attributes are not defined until late in the modeling process. Therefore, an IDEF1X model may be considered technically complete without any non-key attributes through the first two phases of modeling. At the end of the third phase, however, the model must contain non-key attributes while remaining technically sound.
Enterprise integration implies the data model is balanced with the other elements of an enterprise architecture effort. It is linked and synchronized to the performance, business, service and technical components of the enterprise architecture.
By understanding each dimension and planning your modeling approach to address each one, you can significantly increase the likelihood that your data models will provide added value and a solid foundation for business analysis, information systems design and business operations across the enterprise. Each dimension contributes uniquely to the overall quality of the architecture. The sum of the parts makes the whole stronger; addressing all five dimensions ensures fitness for purpose of the data architecture.