The Qualified Data Modeler
The data modeler is a key position in any business intelligence initiative. But what does it take to be a good data modeler? The skill requirements go well beyond knowledge about how to create and maintain a data model. A person with know-how can create a model that is structurally correct, but building one that is accurate from a business perspective takes more than basic modeling skills.
Modeling skills that are fundamental to the position include model environment establishment, entity relationship model development, data model maintenance, normalized and dimensional modeling and modeling tool proficiency. We'll look at these skills one at a time; analysis skills will be addressed in my next column.
Model environment establishment: Data models don't just happen. The data modeler needs to understand different types of models (e.g., contextual, business, system and technology) that are available, determine which ones should be developed and ensure that the methodology includes appropriate steps for doing so. In addition, for each of the models, standards such as naming, abbreviation, definitional and diagram need to be created to ensure consistency and adherence to sound practices.
Information flow modeling: Within the BI environment, the information flow model depicts the flow of data from the sources through the intermediary structures and onto the data warehouse and data marts. While the ETL designer often develops this model, the data modeler should be familiar with the process and should be able to either develop it or effectively validate it.
Entity relationship model development: The entity relationship (E-R) model is at the heart of most BI structures. This model depicts the major groupings of data, the relationships among these groupings and the data elements themselves. While the development approaches differ, both normalized and dimensional models (see below) are basically E-R models. Concepts such as referential integrity need to be addressed in all E-R models, and the way in which the referential integrity rules are to be enforced could impact the soundness of the resulting data structures.
Data model maintenance: Building the initial data model is one thing; maintaining it is quite another. Once the model is developed, each successive project is likely to require enhancements to the model. The data modeler needs to comprehend the new requirements, determine how to incorporate them into the existing data model, and most importantly, understand any downstream implications of model modifications. When the new requirements simply add entities, attributes and their relationships, the impact on existing structures is often minimal. However, when the new requirements result in structural changes (e.g., new attributes/columns in existing entities/tables or changes to the structure or relationships that already exist), the data modeler needs to address all of the downstream data model and structure implications.
Normalized and dimensional modeling: While both normalized and dimensional models are "relational," the processes for building them are different. Regardless of whether or not the data modeler's primary job is to design the normalized (e.g., data warehouse in a hub-and-spoke) or dimensional (e.g., OLAP data mart) structures, the person should be very familiar with both techniques. The normalized hub often feeds the dimensional data marts and the dimensional data marts are often fed by a normalized hub or normalized structures in the back office. By understanding both approaches, the data modeler can better anticipate the upstream and downstream implications of the data designs he or she generates.
Modeling tool proficiency: The data models should be developed using a data modeling tool, hence the modeler must be proficient in the use of that tool. He or she needs to understand how to import data structures that can be used as starting points, how to efficiently use the tool, how to leverage features in the tool and how to use the tool to aid in capturing, maintaining and disseminating metadata. If no data modeling tool exists, the data modeler should be aware of desirable features so that he or she can lead an effort for selecting the appropriate tool.
These are the basic skills every data modeler should have. In addition, familiarity with other modeling techniques such as use case and state transition are helpful, because they enrich the modeler's ability to perform the related analysis. I welcome your input (at email@example.com) if you think there are additional critical skills that should be added to this list.
These skills alone, however, are not sufficient. In next month's column, I will describe the analytical skills that the modeler must also possess to ensure that the model meets the business needs.