We’ve all heard about the importance of data quality in our IT systems and how the data that flows through our applications is the fuel of the business processes. Yet, surprisingly few organizations have a structured approach to measuring the quality of their data. Some may have a few custom reports or manually compiled Excel sheets that show a few aspects of data quality, but if information is truly an enterprise asset, shouldn’t we be measuring and monitoring it like we do with all the other assets of the organization? Like most other things, data quality can only be managed properly if it is measured and monitored.The main reasons for implementing a data quality monitoring concept are to ensure that you identify:
- Trends in data quality,
- Data quality issues before they impact critical business processes and
- Areas where process improvements are needed.
- A structured and methodological approach to measuring and monitoring the quality of data should be part of a larger master data management or information management strategy.
The Dimensions of Data Quality
Looking at the offerings of many data profiling and data quality tool vendors, one could be led to think that data quality is just about checking if all fields have a value and if it’s valid compared to the domain that is defined for that specific field. While these are certainly relevant aspects of data quality, they only represent two out of the six common dimensions of data quality.When defining a data quality measuring concept, you should aim to focus on the dimensions that are meaningful and relevant for the business without spending too many resources, i.e., you need to justify the business relevance of what you measure. On the other hand, measuring all the different dimensions of data quality gives you the most complete picture. For instance, what is it worth to have a valid address for a customer if it is not correct because the customer has moved?A challenge that organizations face as they attempt to define data quality key performance indicators is that completeness, validity and integrity may be relatively easy to measure, but measuring consistency, accuracy and timeliness is a whole other story. In general, we would argue that the relationship illustrated in Figure 2 exists between the difficulty of measurement and the business impact.Because accuracy, timeliness and consistency are the more complex dimensions to measure, here are a few hints on how to address them.
The options to check accuracy are either manual sampling or verification against a trusted external source. For name and address data, there are several external sources available such as D&B (Dun & Bradstreet). Another common form of external data verification is bank account verification that is offered by several companies. This is a good way to ensure data as critical as bank account information from a vendor is correct before you make any payments.
As part of the Patriot Act, U.S. companies are restricted or prohibited from exporting or providing services of any kind to any party listed on any of a number of restricted party lists maintained by the U.S. Department of Commerce, Department of Treasury, U.S. Customs Service, Defense Department and others. This goes beyond the classic view of ensuring accuracy of data, but it’s really about ensuring that the organizations you do business with are really who you think they are. There are several companies (e.g., JP Morgan Vastera and Management Dynamics) that can help you screen your new business partners.
A good approach here is to measure the process from the request of a new master data element to the point it is available for use. This is especially relevant for large complex master data objects such as a new customer or product, where multiple people in different departments are typically involved in the data creation process. Depending on your information management architecture, it could also be relevant to measure the time from data is entered in an operational system to when it is available for reporting in your business intelligence environment.
The aspect of consistency can overall be split into duplicate records and cross-system consistency. Duplicate records are a very common problem and have two typical sources:
- They are created by mistake, simply because the user was not aware that the record existed already, or
- Duplication is due to system limitations. Common problems include systems that cannot store different payment terms or different currencies for the same vendor, and then the same vendor is created multiple times as a workaround.
In both cases, the duplicate records for the same customer or vendor will result in incorrect reporting and affect the business directly. You should ideally try to catch duplicate records at the point of data entry, but as it is difficult to completely avoid the creation of duplicate records, you should also define KPIs that capture and present potential duplication issues. This way you can handle and at least map duplicates to the master record in order to deliver trustable reporting. Consistency is typically focused on ensuring that data synchronizing across systems, is consistent. If somehow the synchronization fails between two systems it’s very important to identify and correct the error, otherwise the business will experience inconsistency between the systems. A cross-system consistency KPI could either represent entire records or individual fields.
Defining good KPIs
Building a data quality monitoring concept involves the following four basic steps:
- Define master data objects of importance (e.g., customer data).
- For each master data object, define a set of data quality KPIs.
- For each KPI, define measure details.
- Define procedures for follow-up on data quality issues.
Basic column analysis (number of blanks, max, min, uniqueness, etc.) and integrity analysis is a great place to start when defining data quality KPIs, but in order to give a more complete picture, you should also collect input from the sources such as BI and ask about data quality pain points. We recommend gathering candidate measures like illustrated in Figure 3:Every output from the data assessment exercise is not a KPI. But there are likely a couple of measures that you analyze during the assessment that you would like to monitor on a continuous basis. These are selected to be KPIs.Keep in mind that data quality KPIs should express the important characteristics of data quality of a particular area of a data object.Measures are typically percentages, ratios or number of occurrences, and here is an important point: for consistency reasons, aim to harmonize the measures. If, for instance, one measure is “number of customers without a postal code” while another is “percentage of customers with a valid VAT-number,” a list of KPIs will look confusing, since the first KPI should be as low as possible, and the other as close to 100 as possible.A good approach is to have all data quality measures as percentages, with 100 percent indicating the highest level of quality. This way, a manager who wants to quickly review the quality of a particular data object covered by several KPIs can easily interpret the data quality report.Be careful not to define too many KPIs, as this will just make the organizational implementation more difficult. If someone is presented with a report containing 50 or 100 data quality KPIs about customer data, they are very unlikely to look at each one. Choose a few good and relevant KPIs to start with, and then you can always add more later on.Also pay attention to controlling fields, because they are very important in order to reflect the correct measure. Controlling fields prevent KPIs from the need to look at all records in a table and will, for instance determine if a field should be filled or not. Common SAP examples include the country code field that defines if a value in the state field is required, or a material type that defines if a weight is required for a material record.Indirect Measures In some cases, there might be critical fields (e.g., MRP type or weight for a product) where the correct value is of utmost importance, but at the same time, it’s close to impossible to define the rules to check if a new value entered is correct. In these cases, one approach is to measure indirectly by, for instance, reporting what users have changed these values for which products over the last 24 hours, week, or whatever is appropriate in your organization.PreventionAn interesting point is that in a packaged solution like SAP ECC, a lot of the things like completeness and validity should be enforced by the system at the time of data entry, and if it isn’t, you should consider implementing a data input validation rule rather than allowing bad data to be entered and then measure it.However, there are cases, where the business logic of a field is too ambiguous to be enforced by a simple input validation rule. In addition, data may enter you SAP system through interfaces where some input validation could be ignored. Another common source of data quality issues is data migration efforts, where data is migrated without sufficient validation and verification prior to loading.
KPIs must be properly documented for implementation and to be able to follow up. So, we recommend that for each defined KPI, you document the KPIs listed in Figure 4.
Building and Presenting KPIs
Building the technical architecture to analyze and present KPIs can be done in various ways. Essentially, you need to extract your data from your ERP system, analyze the data, calculate KPIs and finally present it to the business community. The extract and analysis can be done using an ETL tool, a data quality tool and/or a data profiling tool. In some cases, you can also reach quite far using simple SQL. Figure 5 is a simplified illustration of a typical architecture.Presenting the KPIs to your organization in the right way is critical. The reports need to be easy to access and understand for the business users, and, therefore, it’s very often a help if you can leverage already existing reporting tools that business users are accustomed to. By presenting data quality reports in the same framework as other business reports, you also send the message that data quality is an important part of managing the business. We have had great success building complete data quality monitoring solutions this way.Another approach is to use a packaged data quality monitoring solution such as Data Insight XI from Business Objects or DataDialysis from Back Office Associates. Both of these tools offer a rapid implementation of a data quality monitoring solution, yet for more complex KPIs, you’ll probably need some custom extract, transform and load/SQL.
A Word About Process and Governance
For a data quality monitoring effort to be effective, clearly defined targets for data quality and followup mechanisms are required. Making sure there is a robust process for monitoring data quality and following up on issues is important. Having the business feel a sense of ownership and responsibility is critical.An example of a simple but effective process for monitoring and addressing issues related to data quality is an iterative process as illustrated in Figure 7.Starting with publish KPI, the first three boxes are rather straightforward, but we want to add some comments to Evaluate root cause and Plan corrective actions.Evaluate Root CauseIt is of key importance that the root cause of the problem is identified. We need to identify the originating process and find out where in the process the problem is created, and why. In this step, the focus should be on the process rather than on the person. You don’t want to create a blame culture.Common root causes include:
- Unclear definitions for how to enter data
- Lack of training for data maintenance employees
- Data entered by the wrong person (someone without the required knowledge or someone without a sense of ownership),
- Interface problems (causing inconsistent data),
- Workload too heavy for some individuals and
- Speed of data entry prioritized over quality of data entered.
Plan Corrective ActionsPrepare the steps required to fix the data quality issue. And remember there are two aspects to this:
- Fix the wrong data, and
- Fix the bad process to prevent the problem from reoccurring.
In some cases, changes to security setup can also be part of the solution. Common improvements include: more training for data maintenance employees, more automated validations, additional approval steps, and change of person that enters the data. In planning corrective actions, the impact of the data quality issue must be assessed, as this is a key parameter in deciding the type of preventive actions that are warranted.