Technical literature abounds with facts and testimonials on data quality issues. There are many books, methodologies and tools on the market to facilitate data quality analysis and cleansing. However, most current techniques focus on the assessment of single systems in succession. What is needed instead is a data quality methodology that enables analysis of subject areas across the systems of an enterprise so that organizations can perform an analysis of entities such as customer, vendor or item across all relevant enterprise systems including sales, services, manufacturing and regional enterprise resource planning (ERP).
In many organizations' information systems, master data - data on vendors, materials/items/products and customers - has been identified differently across the company, either structurally or literally. Many of these companies do not have the ability, or time, to look at simple procurement or sales figures across the enterprise. This is especially true in companies that have a "decentralized" management culture.
Also, because of severe economic constraints, most organizations need to cut their costs of doing business to the bare-bones level. Decentralized data standardization takes valuable economic resources to coalesce data from a variety of enterprise systems that cost-conscious companies just cannot afford.
With a cross-system data quality assessment and master data management strategy, companies can begin to scope and build solutions to achieve enterprise data standardization and cut costs. For example, one of the best methods to cut costs is renegotiation of company-wide purchasing contracts and/or discounts with vendors. However, because data is often not standardized throughout the organization, the process of identifying a single vendor or item/material across a single organization is often difficult, if not impossible.
An acquisition or a merger between two companies severely exacerbates this problem because there is no way to identify data correctly and consistently across the new, merged systems. On the other hand, if the newly merged organization had an enterprise-wide data standard and standardization method, information would be more readily available and the renegotiation processes would be possible.
In order to perform the type of data quality assessment required to standardize and remediate master data, organizations must develop an objective method of utilizing tools and techniques to assess enterprise systems holistically and aggregate the results into a meaningful summary. Following are the methods and techniques that can be used to accomplish these cross-system, subject-oriented data quality assessments.
There are five basic, well-accepted levels of data quality assessment:
Level 0 (L0): Domain Analysis determines the actual source system data domain values for data fields of type indicator, code, date and quantity.
Level 1 (L1): Completeness and Validity Assessment focuses on the data content of individual data fields in a data environment. This analysis discovers those records that do not have a significant or meaningful data value.
Level 2 (L2): Structural Integrity Assessment focuses on primary key and foreign key assessment.
Level 3 (L3): Business Rules Compliance Assessment evaluates the quality of data in terms of specific business rules involving multiple data fields within or across records (or rows) that are logically related.
Level 4 (L4): Transformation Rules Assessment focuses on simulating and testing the transformation of source data to target data and then using these results to streamline transformation designs.
Usually, these five steps are performed across the tables, fields and relationships within a single instance of a source system.
Next Steps - Expanding Data Analysis and Management to the Enterprise Level
These techniques can also be expanded and summarized across enterprise systems in a way that is intuitive to project sponsors as well as to the financial planners that have to budget for large expenditures. The first step in the initial cross- system data analysis is actually to perform L0 through L3 analyses as described previously - but on a cross-system, enterprise scale.
After the L0 through L3 analyses are complete, the results must be quantified to determine the next steps. Because a number of team members will be performing these processes, the quantification process should incorporate an objective grading scale and method that will be used by data analysts to quantify the current state of the data quality so that it can be summarized and incorporated into a larger cross-system view of an entity or attributes. A simple system with five options for grading a result set is provided in Figure 1. The higher the value, the worse the data quality result.
Figure 1: Grading System for a Result Set
The key to assigning the compliance grade to the underlying data should be determined by the assessment team prior to implementation, but there are some key starting points by analysis level:
Level 0: Data Type Variance. Determine if actual data types meet the expectation. For example, if a field is supposed to contain integer characters such as ZIP codes, but analysis reveals that only 60 percent actually contain integers while the other 40 percent contain character values, there may be cleansing costs associated with standardizing the field.
Level 1: Data Completeness. Determine how much data is actually stored in a field as a percentage of the total records of the field. If a field such as "customer classification" is null 90 percent of the time, then it can be assumed that this data is not used and/or maintained in the source system.
Level 2: Data Structure. Determine the level of referential integrity within the underlying data structure. For example, check the primary key(s) or potential primary key(s) in the underlying tables. Also, determine if there is a defined, physical foreign key structure between the tables and fields within the underlying system. A system with poor referential integrity inevitably has poor data quality as well.
Level 3: Business Rules Compliance. This step begins the capture and validation of business rules. It is the most "hands on" of the data analysis process and is also one of the more time-consuming. Depending on the availability of resources and source system experts, it may be feasible to delay the L3 analysis until a standardization plan or a remediation plan is developed. It is often economical to perform business rules compliance testing only on entities and attributes that offer the greatest value when performing a cross-system analysis. Those entities typically include: customer, vendor, materials, products and chart of accounts.
Getting and Presenting the Results
The results of the cross-system analysis should be stored in a database with the appropriate size and structure. When performing a vendor analysis across fewer than 30 systems, a Microsoft Access database will do. However, as the number of entities for the cross-system analysis increases, a move to a bigger database such as SQL Server, Oracle or DB2 UDB may be warranted. The structure of the database's tables is flexible and is very much dependent on the requirements of the project. Include sufficient information to make the results clearly understandable to business sponsors.
After these analyses have been loaded into the database, they can be summarized and presented in an acceptable reporting format or tool familiar to the organization's business users. Sometimes spreadsheet or simple database reports are very much acceptable. Other times - depending on the organization's preferences - it may be preferable to use Web-based OLAP functionality with the results reported via the corporate portal to an international community.
The results can be presented in myriad ways. The following examples are the most common:
- System Data Quality (All Fields): This report shows the overall data quality grade for a system, independent of library, table/file and column/field. Users should be able to drill down to more details to investigate the results.
- System Data Quality (Mapped Fields): This report shows the overall data quality for mapped fields for a system, independent of library, table/file and column/field. Users should be able to drill down to more details such as "Subject Area."
- Master List, Cross-System: This report shows the data quality for a master attribute such as vendor name within the vendor subject area across systems, so that users can look at the cost to standardize and cleanse a specific subject area. This report will be very useful when proposing the most cost-effective data cleansing options.
These reports provide the ability to examine meaningful "objective" statistics about the current state of the organization's master data before attempts are made at performing a standardization, cleansing or data integration project.
Communicating the Results to the Organization
The next step in the master data management strategy is to discuss data quality results with business owners. Start by working with the business owners to determine the optimum state of the data for a subject area and how the strategy would reap the company benefits in cost savings or expanded revenue. Next, perform a gap analysis to determine how the current state compares to the optimum state.
Based on the data quality assessment, the organization will be able to closely estimate the cost associated with standardizing a subject area within a system. It will have an idea of the process, tools, resources and manual/ automated effort required per system. Based on this cost, the organization can then determine the most appropriate and cost-effective order of system standardization.
It's more than numbers and data, however. The master data management strategy needs to be "sold" to the business owners. Generally, highly technical discussions bore people in management. Data is abstract, which makes it difficult to visualize. Therefore, many businesspeople are either confused about how to implement a master data management strategy, or they are pessimistic about its value.
For these reasons, it is absolutely critical to develop a business action plan (BAP). A BAP lists the major events within the standardization process and the time and resources that the technical team will require from the business owners. It's a good idea to prepare a workbook that the business owners and the source system owners can use to visualize the flow of the project. Sample sections would include:
- Overall Orientation
- Project Management and Milestone Identification
- Role Identification
- Business Rule Gathering
- Data Prioritization: Entity, Attribute, Volumes
- Initial Cleansing Process (Automated and Manual)
- Maintenance of Cleansed Data (Automated and Manual)
- Milestone Evaluation
Answering the Tough Questions
When presenting the BAP, the technical team should be prepared to answer the following questions.
How will this improve my ability to do business (operations/decision making)?A first step is to cite well-known white papers or case studies that show the overall business value in standardizing a specific master data subject area. Depending on the subject area, many operational and decision-making components of the business can be vastly improved. It's a good idea to tie master data improvement to corporate initiatives such as reducing expenses or extending market share.
How much will it cost me to deploy this strategy in terms of time, tools and resources?In the past, standardizing or "cleansing" data has been considered a black hole. Data analysts have been buried in the details, and management has been less than willing to invest in something for which they could not determine the cost of implementation.
However, with a master data management strategy, costs and time lines can be determined more easily, and they can be extrapolated to the enterprise. One example is assigning payment terms to vendors. For example, if in Japan it takes 10 minutes per vendor and there are 60 vendors, the organization will need 10 hours of manual labor support from the sales organization and finance department. If there are 1,000 products that need to be manually categorized by engineers and it takes six minutes per item, this translates to 100 person-hours of work for a group of engineers.
How long will it take to see value in the master data management strategy? All implementation should be deployed in a phased approach. Standardize one entity or system at a time until the team feels comfortable with the work effort and deployment process. Afterward, teams can work on performing data standardizations in parallel throughout the company. In general, phases or work time lines should be standard, but the length of the phase is based on the number of subject areas being standardized. A typical standardization phase should last two to four months, and value can be seen in as few as 60 days.
How will the system be monitored?The last thing that management wants to do is invest money in an initiative that no one wants to maintain. Therefore, the source system owner should assume data management responsibilities. It's best to develop data quality or data "compliance" reports that are calculated on a monthly or quarterly basis, and that indicate how well a source system owner is maintaining the developed data standards. The report results should be presented in business language that puts a dollar value next to the maintenance effort. This will keep master data quality in the minds of the operationally focused entities.
The Bottom Line
Gone are the days of careless spending and huge investment dollars. Therefore, any information technology (IT) project decisions must be justified in business terms before executive management will sponsor them. Master data management and assessment projects are excellent IT initiatives to undertake in this climate because master data management strategy and implementation can be justified to business sponsors in terms of costs and benefits to the organization. The benefits of extending data management holistically across the organization are clear: redundancy is all but eliminated and maintenance costs are reduced by distributing responsibilities among source system owners. Also, vis-à-vis the reporting strategies outlined previously, the results of the project can be clearly communicated - in business terms - to executive management. That's a win/win situation for any organization.
All information provided is of a general nature and is not intended to address the circumstances of any particular individual or entity. Although we endeavor to provide accurate and timely information, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future. No one should act upon such information without appropriate professional advice after a thorough examination of the particular situation. The views and opinions are those of the author and do not necessarily represent the views and opinions of BearingPoint, Inc.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access