Some people believe data modeling has become very passé these days. The belief is that because data modeling theory is more than 30 years old and, because some data modeling tools have been around for 10 to 20 years, somehow data modeling is no longer relevant. Nothing could be further from the truth. In fact, data modeling may now be more necessary than ever before.
While there are other modeling techniques and notations, such as business process modeling and Unified Modeling Language, the need to accurately capture business data requirements and transform them into a reliable database structural design is as paramount as ever. The key differentiator is that data modeling is the only technique and notation that focuses on the data at rest. All the others tend to focus more on data in motion. Put another way data modeling concentrates on issues that lead to a solid database design, while others approaches tend to focus more on issues that will result in better application design or things useful to programmers, such as data structures, objects, classes, methods and application code generation.
Case in point: Ive personally served as an expert witness in several court trials where plaintiffs sued defendants for serious financial remuneration when custom database applications had performance and/or data accuracy problems. In every case, there was a failure to data model the business requirements. Thus, the data effectiveness suffered. Moreover, ad hoc database design, or database design using more programmatic-oriented techniques and tools, often resulted in inefficient database design. No amount of coding could overcome the resulting bad database design. So, in every case, the plaintiff won.
The other reason data modeling has seen measurable resurgence is the data warehousing phenomenon. With cheap storage these days, most companies can afford, and benefit from, retaining historical aggregate and/or summary data for making significant strategic decisions. With the accumulation of numerous source legacy online transaction processing systems, there are two key ways to approach populating a data warehouse: directly from source to warehouse (as shown in Figure 1) or through an intermediary database often referred to as an operational data store (as shown in Figure 2).
Sufficient debate exists as to which approach is superior, but I wont address that here. Regardless of which approach is selected, the database design (i.e., the data at rest) is paramount because, in a data warehouse, the data itself - and the business information it contains - is the most relevant and valuable asset. Typical data warehouse queries and reports issued via business intelligence tools process that asset to yield strategic decision-making results.
The other key area where data modeling often supports the whole data warehousing and BI effort is the mapping of legacy data fields to their DW and BI counterparts. This metadata mapping about how frontline business data maps to the data warehouse helps with the design of both queries and/or reports, as well as with extract, transform and load programming efforts. Without such mapping, there would be no automatic tie to the dependent data warehousing information as OLTP legacy systems evolve. Hence, one would have to almost totally re-engineer rather than simply follow the OLTP source data ramifications and ripples downstream to the DW and BI endpoints.
For those not involved with data warehousing projects - perhaps those performing more traditional OLTP-type systems development - data modeling still is important. Often, however, people get so caught up in novel paradigms such as extreme programming, agile software development or scrum that they compromise data modeling, or even skip it entirely. The problem is that these new approaches dont always spell out exactly how data modeling should be incorporated, so people often forego it.
My belief is that no matter what latest and greatest approach you use, data modeling should be integrated into your development process wherever it makes sense. Figure 3 shows how both conceptual and physical data modeling should fit into an overall database design process - whether its for a totally new system or for one thats being updated or re-engineered.
There is one final reason why data modeling has been getting more attention these days. In many cases, organizations finally are requiring data models as a sign-off deliverable of the development process. I attribute this to their attempt to adhere to the Software Engineering Institutes Capability Maturity Model and Capability Maturity Model Integration concepts. The idea here is quite simple: to mature your development process regardless of technique, you need to develop in terms of both the processes and tools used to achieve the desired better end result. Both processes and tools can lead to maturity, helpig to reinvigorate many peoples interest in data modeling.
Now comes the hard part. Which data modeling tool should you use? That might seem like a tough or loaded question; there are numerous data modeling tools available. Plus, many enterprise modeling suites contain data modeling capabilities. Rather than advise any particular tool, Im going to outline some basic guidelines for things to avoid. I believe that any tool that meets some standard and minimal requirements will help you produce effective and efficient data models and, hence, the resulting databases.
Avoid drawing tools that aspire to be data modeling tools. A good data modeling tool supports defining tons of metadata with business relevance. Think of the diagram as just the tip of the iceberg where you dont see the 90 percent of the mass that is underwater. The same is true for data modeling. If you concentrate only on what the picture is, youll probably compromise the effectiveness of the resulting database.
Choose a tool that fits your needs. Often, people purchase a killer modeling tool that offers everything imaginable. But, if all you need or will use is the data modeling portion, why pay for more? The key concern here is that the more any tool does besides data modeling, the better the chance its data modeling capabilities may have been compromised to do everything else. Sometimes more is not better.
Data definition language. This is another case where more might not be better. It is better if your tool supports 100 percent accurate CREATE or ALTER scripts for a few databases important to you than all of them at a lesser level. But be very careful - the DDL generated by many tools, even those focusing on just a few databases, can often generate less than optimal DDL. You have to know what to look for; so, engage your database administrator in making the decision, just to be safe.
Verify that your data modeling tool provides robust model consistency and accuracy checking reports and/or utilities. As data models grow (and they will), it can be quite overwhelming to have to manually check everything. And you cannot expect the poor DBA to sanity check the thousands or tens of thousands of DDL lines a data modeling tool can quickly generate. Effectiveness is mostly on your shoulders, but efficiency can be aided by good data modeling checking utilities.
Data modeling has come a long way since its inception. Even though the heydays of CASE and software engineering passed with the 90s, the need for and usefulness of data models has not subsided. Data modeling can assist with any effort, regardless of development methodology or paradigm. So, dont pass on data modeling just because its a mature technique - you might be very sorry if you do.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access