Who Needs Metadata?
Data e.Quality
Information Management Online, June 11, 2001
A few weeks ago, I was asked to help oversee a client's business intelligence project that was running behind schedule. By the time I joined the project, the requirements and design had been completed, and the delivery team had finished most of its development work. The ETL process was working properly, and several reports had been developed. However, the project had been stalled for several weeks, having made little or no progress. The reason for the delay was that each of the reports that had been developed needed to be "certified."
Approximately 15 reports had been requested for the first release of the system. The requirements team had been told that most of these would replace reports that users were currently receiving. The team was provided with a set of the existing reports, which were mainly Microsoft Excel spreadsheets. The reports looked simple enough. Each column had a heading that identified the data it contained, and the team was told where the data was located. An estimate was made for how long the reports would take to develop. So what was the problem? In a word, metadata.
What became apparent later was that the tester certifying the reports was using a different set of business rules than the development team. Before certifying each report, this person would check with the user to confirm their understanding of the content of each field of the report. The development team did not know that, in some cases, a total might contain an unusual formula relying on supplementary data from a different part of the report. (See the total shown in the following example.)
Advertisement
| Order Description | Quantity |
| Week - 1 | |
| Product - 1 | 10 |
| Product - 2 | 15 |
| Product - 3 | 12 |
| Total | 42 |
In this example, the products ordered for Week 1 look like they should add up to 37 orders, but the total line shows 42. What's going on? Could it be a data quality problem?
Version 1 of the Business Rule
The development team used this business rule to calculate the total: "Sum up the order quantities for all of the products ordered that week." The summation of the three quantities (10+15+12) equals 37. But the report failed the certification process. (In our real life example, the number of orders and the complexity of the formulas were much more complex.) The developers checked and rechecked the formulas in their business objects report. They checked the reports, the universes and the source data for data quality issues. But based on their understanding of the business rule, the number came up 37 each time.
Version 2 of the Business Rule
In the meantime, the users and their managers could not figure out why it was taking so long to finish the report (and the project). Finally, the two groups decided to work together on report certification. A different business rule quickly surfaced for the example I've given, and the issue was resolved. That business rule was: "For the first week of the month, if the week begins on anything but a Monday, the user adds the orders from the days starting with the previous Monday forward, and the orders are actually shown in both sets of totals." (See the example below.)
| Order Description | Quantity | DOW |
| Last Week of Previous Month | ||
| Product - 5 | 5 | Monday |
| Total | 5 | |
| Week - 1 | ||
| Product - 1 | 10 | Tuesday |
| Product - 2 | 15 | Wednesday |
| Product - 3 | 12 | Friday |
| Total | 42 |
Does that seem like a strange rule? To IT, yes; but the user had been thinking in those terms for years. This is just one example from nearly 15 reports that had issues. Everyone involved now acknowledges subtle differences in the perceived business rules. During the investigation process, however, that was not apparent. Users and managers had begun to doubt the competency of the designers and developers, as well as the credibility of the data warehouse itself. Fortunately, all issues were resolved — but not until the late stages of the project.
What can be done to prevent this from happening again? One can make the case that better requirements gathering should have been performed, or that better specifications should have been written. That is true. However, what would those documents have contained that would have improved the result? The answer is metadata.
Development teams that take the time to collect metadata during a project's early stages do much to manage the risk associated with data quality and related issues. A metadata dictionary works best when a expert in the business subject matter prepares it — or at least approves it.
Each column that appears in the data warehouse database or on a report requires a definition. Some may think, what value is the metadata dictionary? Is the time really being spent well? The answer is yes. Such documents become a valuable resource to the development team — and to the users when they begin using the data themselves. You might be thinking: We could have created a metadata dictionary and still not have captured that rule. Maybe.
But at least you or the user would have made a deliberate effort to capture the correct rule. Right or wrong, the user-supplied rule (or user approved rule) would have been referenced by or copied into the report specifications. And the certification process would have used a consistent business rule. The certification would have confirmed that the report had indeed been built to specification. At worst, a parallel run of the report would have shown an inconsistency — but the credibility of the team would not be in question.
I am involved now with a project in which our team is gathering the requirements for a new Information warehouse. You can bet that we will make sure a metadata dictionary is created — and that it includes clear and comprehensive definitions.
Ron Forino is with Columbia University as the director of Enterprise Reporting. Forino has spent the last few years providing strategic planning for executive management, primarily in the areas of information management. Formerly a director of Data Warehousing and Information Management, he has developed methodologies supporting information management services in data warehousing, data integration, data quality and knowledge management. He has been in the industry for more than 20 years as a programmer, analyst, architect and project manager. Forino has been a conference speaker on the subjects of data warehousing, value-chain business intelligence and data quality and is a member of the executive board for DAMA-New Jersey. Forino can be reached at RonForino@aol.com.
For more information on related topics, visit the following channels:






