What is a Data Model?
A high-level data model conveys the core concepts and/or principles of an organization in a simple way, using concise descriptions. The advantage of developing the high-level model is that it facilitates arriving at common terminology and definitions of the concepts and principles.
Everyone Knows What a Customer Is, Right?
The customer definition may change based on a person’s perspective. To the billing department, a customer may be someone who owns a product or service sold by the company and to whom an invoice is sent. To a salesperson, a customer is someone who has not yet bought a product but to whom they hope to make a sale. And there are more things to clarify: Does a customer have to be a person, or can another company be considered a customer? Is someone who has purchased a product from us in the past, but does not have an active account, service agreement or support contract still a customer? Is there a difference between an active and an inactive customer or an existing customer and a prospective customer?
Strive to Align on Common Terminology and Definitions
An important goal is to align on common terminology, business definitions and rules and create a diagram to describe core concepts and principles of an organization and what they mean. The diagram can be as simple as a set of boxes with text in them. A high-level data model should always be simple and clear enough that a nontechnical person can understand it. In fact, a high-level data model doesn’t even need to look like a traditional data model or be shown as a picture at all. The same information can be placed in a table or spreadsheet.
A data modeling purist would correctly point out that you can’t have two different definitions of customer on a valid model. While strict rules most definitely need to be followed on more detailed data models, the purpose of a high-level data model is communication and gaining consensus on core concepts so that the detailed diagrams created later are based on correct assumptions. It’s okay to bend data modeling rules as long as the focus remains on aligning terminology, definitions and business rules.
High-level data models are the result of an iterative process. It is rare that the participants involved in creating the high-level data model will all agree on definitions when they are initially documented, but a first draft can be used to highlight differences of opinion so that discussions for achieving consensus can begin. Organizations and individuals may not even realize that there are different definitions until they are documented in this way. One technique for reaching consensus among the participants is to identify the various audiences, business areas, projects, applications, etc. that use each particular term. The meaning of a term can change based on its context, so it is critical to understand the context in which each term appears before consensus can be reached.
Avoid a Siloed Approach
Identifying what data is used and by whom takes a lot of effort. It’s much easier to focus on your own project or department, or at least it seems that way on the surface. But this siloed approach may lead to systems that don’t work well together. By involving other groups, it’s possible to leverage work that has already been done without reinventing the wheel. Once the stakeholders are identified (people, groups or organizations that can affect or be affected by an action or policy), it’s time to get them to talk to each other. For example, in a bank, the banking and consumer credit departments might consider a customer to be a person who has an existing account with the bank; on the other hand, the marketing department may also use the term customer to describe people who did not have an account with the bank.
Common terminology, definitions and rules across departments and projects can break through silos, enabling the organization to operate as a single, powerful unit. Master data management, customer data integration, enterprise architecture and data warehousing are all initiatives that attack the lack of integration from different angles, and all of them require an accurate data model to be successful.
The purpose of a data model is to not only document the definitions for and context around information, but also to document the actual physical structure of the databases in which the information is stored. The same information can be stored with many different names, formats and software platforms, so after we have documented how the data should be, we still need to map how it is today. This leads us to a discussion of the importance of standards and reuse.
Once we know that two departments are using the same definition of customer, we need to ensure that the information they use is truly the same. That is, the customer’s name, location and identifier should be basically the same. There might be subtle differences; for example, one department might use the term “Customer ID” while another department spells it out as “Customer Identifier.” We already found that differences in the meaning of the word customer could have negative effects. Here, the meaning of the customer identifier field is exactly the same, but different terms are used to describe it. A human could easily determine that customer ID and customer identifier are the same thing (although we should always verify this to be sure), but a computer cannot. Thus, to truly integrate these systems, we have to use common naming standards to make sure that we can match information correctly. Again, this might seem like a simple task, but it can take months or years in a large organization with thousands or even millions of different pieces of information to organize.