Master Data Management: Cross-System Assessment
Information Management Magazine, January 2004
Technical literature abounds with facts and testimonials on data quality issues. There are many books, methodologies and tools on the market to facilitate data quality analysis and cleansing. However, most current techniques focus on the assessment of single systems in succession. What is needed instead is a data quality methodology that enables analysis of subject areas across the systems of an enterprise so that organizations can perform an analysis of entities such as customer, vendor or item across all relevant enterprise systems including sales, services, manufacturing and regional enterprise resource planning (ERP).
In many organizations' information systems, master data - data on vendors, materials/items/products and customers - has been identified differently across the company, either structurally or literally. Many of these companies do not have the ability, or time, to look at simple procurement or sales figures across the enterprise. This is especially true in companies that have a "decentralized" management culture.
Advertisement
Also, because of severe economic constraints, most organizations need to cut their costs of doing business to the bare-bones level. Decentralized data standardization takes valuable economic resources to coalesce data from a variety of enterprise systems that cost-conscious companies just cannot afford.
With a cross-system data quality assessment and master data management strategy, companies can begin to scope and build solutions to achieve enterprise data standardization and cut costs. For example, one of the best methods to cut costs is renegotiation of company-wide purchasing contracts and/or discounts with vendors. However, because data is often not standardized throughout the organization, the process of identifying a single vendor or item/material across a single organization is often difficult, if not impossible.
An acquisition or a merger between two companies severely exacerbates this problem because there is no way to identify data correctly and consistently across the new, merged systems. On the other hand, if the newly merged organization had an enterprise-wide data standard and standardization method, information would be more readily available and the renegotiation processes would be possible.
Starting Out
In order to perform the type of data quality assessment required to standardize and remediate master data, organizations must develop an objective method of utilizing tools and techniques to assess enterprise systems holistically and aggregate the results into a meaningful summary. Following are the methods and techniques that can be used to accomplish these cross-system, subject-oriented data quality assessments.
There are five basic, well-accepted levels of data quality assessment:
Level 0 (L0): Domain Analysis determines the actual source system data domain values for data fields of type indicator, code, date and quantity.
Level 1 (L1): Completeness and Validity Assessment focuses on the data content of individual data fields in a data environment. This analysis discovers those records that do not have a significant or meaningful data value.
Level 2 (L2): Structural Integrity Assessment focuses on primary key and foreign key assessment.
Level 3 (L3): Business Rules Compliance Assessment evaluates the quality of data in terms of specific business rules involving multiple data fields within or across records (or rows) that are logically related.
Level 4 (L4): Transformation Rules Assessment focuses on simulating and testing the transformation of source data to target data and then using these results to streamline transformation designs.
Usually, these five steps are performed across the tables, fields and relationships within a single instance of a source system.
Next Steps - Expanding Data Analysis and Management to the Enterprise Level
These techniques can also be expanded and summarized across enterprise systems in a way that is intuitive to project sponsors as well as to the financial planners that have to budget for large expenditures. The first step in the initial cross- system data analysis is actually to perform L0 through L3 analyses as described previously - but on a cross-system, enterprise scale.
After the L0 through L3 analyses are complete, the results must be quantified to determine the next steps. Because a number of team members will be performing these processes, the quantification process should incorporate an objective grading scale and method that will be used by data analysts to quantify the current state of the data quality so that it can be summarized and incorporated into a larger cross-system view of an entity or attributes. A simple system with five options for grading a result set is provided in Figure 1. The higher the value, the worse the data quality result.

Figure 1: Grading System for a Result Set
The key to assigning the compliance grade to the underlying data should be determined by the assessment team prior to implementation, but there are some key starting points by analysis level:
Level 0: Data Type Variance. Determine if actual data types meet the expectation. For example, if a field is supposed to contain integer characters such as ZIP codes, but analysis reveals that only 60 percent actually contain integers while the other 40 percent contain character values, there may be cleansing costs associated with standardizing the field.
Level 1: Data Completeness. Determine how much data is actually stored in a field as a percentage of the total records of the field. If a field such as "customer classification" is null 90 percent of the time, then it can be assumed that this data is not used and/or maintained in the source system.
Level 2: Data Structure. Determine the level of referential integrity within the underlying data structure. For example, check the primary key(s) or potential primary key(s) in the underlying tables. Also, determine if there is a defined, physical foreign key structure between the tables and fields within the underlying system. A system with poor referential integrity inevitably has poor data quality as well.
Page 1 of 3.






