Making the Connections
One of the most frequently asked questions I have received this past year during my data modeling classes is "What's a good mapping tool?"
That is, what software tool would I recommend for capturing and displaying how a logical attribute is represented in one or more physical columns, or how a structure on a project data model "connects" to a structure on an enterprise data model, or how an entity on a conceptual data model maps to one or more entities on a logical data model?
Having a data model traceability tool store and manage these connections makes analysis, modeling and integration activities quicker to complete and usually provides greater accuracy than manually managing the mappings in a standalone spreadsheet. I asked the Design Challengers: What tool do you use in your organization to manage these mappings?
The four types of tools for maintaining these mappings are data modeling tools, metadata repositories, developer tools and spreadsheets. What surprised me about this challenge was the high number of our user community (more than 65 percent) who use data modeling tools for this purpose.
Data Modeling Tools
Data modeling tools are a great place to store mapping because in many organizations, the roles (people) who build the models are also responsible for mapping. In addition, the source and target data elements are already stored in the modeling tool. The downside is that not everyone responsible for these mappings has access to these tools or knows how to use them. In addition, these tools may not be designed for mapping and, therefore, may not be straightforward to get data in or produce reports in an easy-to-read spreadsheet format.
Chuck Wiberley, manager, data architecture, uses the user-defined properties feature of ERwin Data Modeler to manage the traceability from logical to physical data model. Forrest Carter, data architect, uses the Notes tab in ERwin to store lineage. Craig Ernewein, data architect, and several others use ER/Studio to manage these mappings. Several Design Challengers use PowerDesigner, which contains the lineage feature as part of its tool. One responder to this challenge said PowerDesigner supports the Zachman Framework, and therefore allows different models, diagrams and objects to link to each other as required.
Metadata repositories are great for storing lineage because these tools treat every piece of metadata as a separate object (which could relate to one or more other objects). In this fashion entities can relate to one or more models, data elements can relate to one or more entities, etc. The three big drawbacks to using these tools are their high price tag, steep learning curve and often the lack of a user-friendly presentation (such as a spreadsheet). Metadata repositories can be off-the-shelf or homegrown:
Off-the-shelf. There are several off-the-shelf tools such as BrownStone and Metabase. Mike Gorman, CTO, says in Metabase many types of mappings are possible: "Data elements can be independently mapped to database models that are logical ... Logical models can be mapped to physical."
Custom built. Thijs van der Feltz, information architect, is very satisfied with a custom metadata repository, and his organization has used it for more than 20 years. This tool maintains an enterprise data model (with more than 4,000 entity types). When a new conceptual model is presented, it is validated against this enterprise model. Based on matching concepts and relationships to existing entities, the new model is incrementally added to the enterprise model.
A number of organizations I have worked with capture this lineage in their developer tools. Because these tools are great at mapping data (such as extract, transform and load tools) they can often be used to map metadata. The added benefit of using tools such as Informatica, Altova MapForce and Microsoft Visual Studio to capture this mapping is that it increases the odds that the data model and accompanying mapping metadata will be used.
If the data model and lineage can be imported into the tool (or built directly into the tool), it will save the developers time and ensure a tight connection between analysis and development. There are three concerns with using developer tools to manage the mappings: these tools sometimes come with a steep learning curve, not everyone has access to these tools, and they may not support a spreadsheet-like display mechanism.
Spreadsheets are a popular choice for storing and displaying these mappings because the tools are intuitive and inexpensive. Norman Daoust, data modeling consultant, says, "You can make them as simple or complex as needed to suit the task; you can customize them as you see fit; anyone can read them and add comments, status, etc., without special software." The con with spreadsheets is the effort required to maintain them and the lack of a central repository to allow reuse.