In my December 1997 column, I described several data stewardship roles. The designer establishes standards for the data element names and incorporates them into the system model; and the builder, with responsibility for the physical schema, incorporates these elements into the physical databases and tables with consideration for the technical environment and associated standards. Ultimately, the end users populate physical tables and are accountable for the quality of the data they enter. With a stewardship program in place delineating these responsibilities, the content of data elements in the data warehouse is likely to match the label. Within the legacy systems, however, this is not always the case. To effectively address the problem of misleading labels, both short-term and long-term solutions are needed. The short-term solution is one that enables a data warehouse team, for example, to build the data warehouse without being bogged down with repairing the legacy environment. The long-term solution is one that prevents the recurrence of the problem in the future.
The short-term solution to misleading labels is for the data warehousing team to perform source system analysis so that the true meaning of the data in each field can be determined. This analysis requires more than just looking at a data dictionary (if one exists). It requires looking at the data and talking to users of the system so that any abuses of the element can also be unearthed. Once the analysis is complete, the element can be mapped to the appropriate field in the data warehouse model.
The source system analysis is likely to reveal some differences between the data content in the source systems and the content that is needed in the data warehouse. These differences may exist for a number of reasons, including the same field being used for different purposes, differences between the implemented edit rules and the governing business rules, errors in the procedures used by data entry personnel, etc. As each of the differences is discovered, cleansing and transformation actions are designed so that the data in the data warehouse conforms with the governing data warehouse model.1 With the cleansing and transformation logic built, the data warehouse team may consider its job complete.
The source system analysis is likely to reveal some apparent causes of the problem. For example, it may reveal that programmers did not follow standards or document their work, or that data entry personnel did not follow instructions. The data warehouse team needs to provide this information to people responsible for the operational systems or the business processes that use them.
If the problem analysis stops at this level, solutions may be implemented in the form of edicts telling programmers to follow standards and document their work. Similarly, a bulletin may be posted in the business units emphasizing the importance of following instructions. These solutions may be effective, but only for a short time and the problem is likely to reappear. In the case of the programmers and data entry personnel, if the reward system (spelled M-O-N-E-Y) is oriented toward quick completion of the work, then, in the long run, it will prevail, and the memos will be forgotten.
To determine the long-term solution, the root cause of the problem must be determined. The root cause is a factor that contributes to the problem in a substantial manner and is actionable.2 It is often arrived at by repeatedly asking "why."
The data stewards, if they exist, or the people responsible for the operational system and business process--not the data warehouse team--need to perform the root-cause analysis. Unless this analysis is pursued and appropriate corrective measures implemented, similar problems are likely to reappear at some future date.
Quick-fix solutions help companies achieve immediate objectives. To achieve the long-term objectives, however, requires solutions that address the root causes of the problems that organizations encounter. This technique is applicable to the misleading labels. The data warehouse team may overcome the problem through source system analysis so that the warehouse may be developed with minimal delay. This solution does nothing, however, to eliminate the recurrence of the problem. To truly solve the problem, root-cause analysis is needed. Once the root causes are identified, actions aimed at addressing these causes can be developed and implemented.
1 See "Data Quality in the Data Warehouse" by Claudia Imhoff and Jonathan G. Geiger, DM Review, April 1996, p. 55.
2 For additional information on root-cause analysis, see "Statistical Methods for Quality Improvement," edited by Hitoshi Kume, Association for Overseas Technical Scholarship, Japan, 1985.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access