The data model is a valuable enterprise tool. In addition to helping enterprises understand information assets and analyze data requirements, data models support decision-making, enable information sharing, guide data quality initiatives and form the cornerstones of an enterprises data architecture. In fact, data models are the Holy Grail for the business and technical community.
A quality data architecture is the soul of an organization and must clearly depict how information flows through the enterprise and how it is used. The key component of a data architecture is the data model, which is typically a graphical representation of data structure.
Thousands of data models are being built these days. But for every good data model, there are a dozen bad ones. Existing efforts focus on completing the task at hand building an adequate model rather than spending time validating the effort. Imagine a mission-critical system deployed without testing. If we agree that a robust data architecture developed and maintained through data modeling efforts is the backbone of the information enterprise, then why arent we spending 25 to 30 percent of our effort in a quality assurance review? The impact of forgoing this step is grave.
What Constitutes a Quality Data Model Quality?
Generally, quality is addressed relative to fitness for purpose. For quality data models, we talk about two concepts, completeness and correctness, from two perspectives, conceptual and technical. We document data models using what is essentially a language. Using its syntax correctly and making sure the model is an accurate and complete representation of the concept we are representing yields a quality result. These concepts and perspectives shape our quality framework:
- The conceptual perspective pertains to context or the meaning of the representation of the architecture relative to the organization.
- The technical perspective pertains to the adherence to syntax or internal integrity of the representation.
- The completeness concept is one of wholeness or comprehensive coverage.
- The correctness concept is one of accuracy or correctness.
To ensure a data model is built right the first time, we have taken basic ideas about languages, syntax, semantics, models and quality and have joined them to form the five dimensions of data model quality that are depicted in Figure 1. Together these provide a framework for a comprehensive modeling approach.
Figure 1: The Five Dimensions
The Five Dimensions Defined
Conceptual correctness implies that the data architecture accurately reflects the business objects of interest for the enterprise and requires that the data structure needed to support all business processing is in place. Achieving conceptual correctness depends on the ability to translate information of interest in the business environment into a structured representation using a semantic language that forms a meaningful and accurate representation of the real world. Determining conceptual correctness is one of the most difficult aspects of assessing overall quality and the most challenging aspect of building a data model in the first place.
Conceptual completeness implies that the data model contains objects adequate to describe the full scope of the business domain that the model purports to represent. Our ability to judge the quality of a data model is closely tied to outside factors such as government and legal mandates, financial constraints and stakeholder requirements. You cant build representative data models by focusing exclusively on business data. You must take into account all of these factors to understand data and its interrelationships, or you will never build good data models.
Technical correctness implies that the objects contained in the data model do not violate any of the established syntax rules of the given language. Syntactic correctness means data model boxes, lines and symbols are used for their intended purposes and that the model adheres to generally accepted practices of the chosen methodology. Once rules are established they become part of the syntax and the models must be judged against those rules. Technical completeness implies that all the requisite data model objects, components, elements and details are captured at appropriate levels of detail for the purpose of the data architecture. As an example, assume we have adopted IDEF1X, a modeling notation used extensively by the federal government and by some commercial organizations as our modeling technique. In IDEF1X, data models are supposed to be built in three distinct phases: an entity relationship modeling phase, followed by a key based modeling phase and finished with a fully attributed modeling phase (NIST 1992). As the name implies, non-key attributes are not defined until late in the modeling process. Therefore, an IDEF1X model may be considered technically complete without any non-key attributes through the first two phases of modeling. At the end of the third phase, however, the model must contain non-key attributes while remaining technically sound.
Enterprise integration implies the data model is balanced with the other elements of an enterprise architecture effort. It is linked and synchronized to the performance, business, service and technical components of the enterprise architecture.
By understanding each dimension and planning your modeling approach to address each one, you can significantly increase the likelihood that your data models will provide added value and a solid foundation for business analysis, information systems design and business operations across the enterprise. Each dimension contributes uniquely to the overall quality of the architecture. The sum of the parts makes the whole stronger; addressing all five dimensions ensures fitness for purpose of the data architecture.
These five dimensions form a framework for assessing a data models utility. This framework may also be applied to each component of the architecture independently to ensure seamless integration.
Figure 2: Data Architecture
Data Quality Review Process
Reviewing data models to determine quality can be challenging. Typically, data dictionaries and matrices are structured or presented in an alphabetical sequence. Unfortunately, this organizational paradigm has nothing to do with the context of the organization and, therefore, does not facilitate review from any of the five quality dimensions. Graphical data models are typically laid out to be aesthetically pleasing. While minimizing line crossing and grouping some objects that are closely associated can help with comprehension, it still does not provide a repeatable structure for reviewing a large, complex model.
As part of the systemic evaluation process, it is extremely important to review the model in an orderly progression. Random selection of a starting point leads to difficulties in making correct assessments, and often leads back through the same parts of the model over and over again in an effort to trace primary key migrations and identify circular references. To avoid this, break the model into logically cohesive subsets prior to starting the review. Each subject area should be laid out in data dependency sequence, in other words, based on the dependencies of each entity upon each other.
Figure 3: Systemic Evaluation Process
Review Technical Quality Dimensions
To confirm the technical completeness and correctness of the data model, conduct a review of each entity, including all of its elements, attributes, relationships and links, as documented in any components of the data architecture.
Independent entities. Begin by reviewing independent entities, those which do not inherit any foreign key. By starting the review at the opposite end of the family tree from cluster endpoints, you can more easily track (and evaluate) the migration of keys through the model and follow a general-to-specific path through the model, because dependent children inherit characteristics of the parent entities. This helps the reviewer understand the model syntax use and grasp the business concepts implicitly in the model construct.
Dependent entities. Next, shift focus to dependent entities, those with foreign keys (serving as either primary keys or non-primary foreign keys). Subtype, associative and attributive entity types are dependent entities. Trace the relationships down from independent entities, and then follow parent-child relationship paths through successive dependent entities until you reach the endpoint. You will be following multiple dependency paths, each originating with an independent entity and merging with other dependency paths. This progression allows you to continue tracking key migration and follow the natural flow of the model.
Backward pass. Retrace the same paths in the opposite direction, beginning with the endpoints and moving upstream to the independent entities. This progresses relatively quickly, because most of the issues related to the syntax and business concepts have already been captured. The objective is to ensure that nothing has been overlooked due to a single perspective review. To prevent reviewing the same objects more than once, mark each entity and attribute as it is reviewed on each pass. Be sure to record issues and make annotations as necessary as you move through the model.
Review Conceptual Quality Dimensions
The next step is to conduct a conceptual quality review of the data model to determine the conceptual correctness and completeness. This can be done from the perspective of either the performance architecture or the business architecture.
Such reviews are significantly more difficult to perform than checking syntax and modeling standards compliance in a technical quality review because they center on the successful transformation of business plans and rules into data architecture elements and relationships. The business concept review is used to evaluate the precision and accuracy with which the requirements within the models scope have been addressed. This is where the majority of time is spent. Unlike syntax checks, it is impossible to automate the review of business concepts because these checks rely on human faculties of interpretation, comparison, translation, knowledge and judgment. A few questions may guide the review process.
1. Verify the data model supports all business statements or stated requirements from other sources. Ensure the data model provides traceability from the data back to the requirements that this data will support or represent.
2. Determine if the model accurately reflects the context of the business plans and rules. Relationships are generally a good place to start when looking for inconsistencies between plans and models.
- Does the relationship capture, support and implement a stated strategy or policy correctly?
- Does the business rule make sense in the context of the functional area or segment being represented, as well as in the full model (i.e., reversed relationship, incorrect cardinality or optionality)?
- Does the business rule supported by the relationships make sense when assessed in the context of other relationships involving each of the two entities connected by the relationship?
This process has been proven to determine data model quality quickly and effectively. The objective, structured, repeatable approach is reusable, which will further lead to consistency across an entire enterprise architecture. A high quality data model has the following characteristics:
- Understandable at all levels: A fellow data architect should have no difficulty understanding the data model you have created; however, a person with limited modeling knowledge should also be able to comprehend the represented model and processes.
- Flexible: The data model should be accurate and stable for the immediate purpose for which it is designed and also be robust enough to accommodate future data requirement additions and changes easily and without major modifications.
- Adherent to standards: Ensure all applicable standards have been followed, regardless of how the standards came into being; i.e. corporate mandate, government directive, etc.
- Reusable: A data model that is complete and comprehensive can be leveraged by other modelers and utilized for constructing a model of similar scale and functionality.
Providing a quality review process to examine data architectures will help your organization answer the basic questions, Am I done? and Is it any good? It will help data architects and modelers achieve quality with their initiatives, which in turn will help information providers respond to changing needs and, ultimately, enable organizations to function more effectively and efficiently.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access