Fundamentally, the goal of IT is to deliver accurate, complete and relevant information in a secure fashion to people and processes on demand. Information about the parties you do business with is a critical asset. As organizations grow over time both organically and through acquisition, data about customers is stored in many places in the enterprise. Each data store is defined differently, used by different business processes and updated by different business applications. The keys for and links between the data that describes customers get out of alignment with the characteristics of customers in the real world. Customers themselves change frequently, as do the business processes that manage customer information, the business logic in applications and the metadata associated with the data stores. The difference between people and legal entities in the real world and the information we have about them is called customer data disorder (CDD). When the condition is advanced, business performance suffers. This article explores the outward signs and symptoms of CDD and an approach you can use to understand the current state of business.


Business Symptoms of Advanced CDD


Customer relationship management (CRM) is the first place to look to determine if your organization suffers from CDD. CRM, whether homegrown or packaged software, sustains the business functions that market, sell, fulfill and support your goods and services - the relationships with your customers. Poor data quality is the number one issue cited in the failure of CRM. Business symptoms of CDD include:

  • Customer satisfaction scores and churn rates do not move substantially,
  • Increased conversion rates at point of contact and lower acquisition costs do not materialize and
  • The cost to market to and serve customers continues to increase.

The front office depends upon information and analysis from the back office. The "warehouse" is often used to describe a collection of functions that include basic reporting, segmentation, what-if analysis, predictive models, risk management and reporting to comply with regulations and legislation like Sarbanes-Oxley (SOX). Business symptoms of CDD in the warehouse include:

  • Inability to understand the number of products owned by a customer and the patterns of adoption,
  • Inability to understand customer lifetime value,
  • Difficulty predicting the next best offer and likelihood to accept and
  •  Reporting results for a customer in conflict from different sources.

Between the front office and the back office, most organizations implement a cross-reference of customer-to-account information. These solutions go by many acronyms, such as CIF (customer information file), CDB (customer database) and MDB (marketing database). In many cases, the only legitimate customer to account cross-reference is managed by a marketing service bureau. Over the last few years, the umbrella term CDI (customer data integration) has been used to describe software and services to implement these solutions. Lately, CDI has given way to MDM (master data management) to make room for other important enterprise domains such as product and chart of accounts. Regardless of what you call it, business symptoms that all is not well include:

  • The business cannot agree on definitions for customer, relationships and hierarchies across functions and brands,
  • It takes too long to bring party data sources into the repository and
  • There is an inability to bind legacy systems to the repository at the transaction level.


Quality of the Information Entity


As discussed in part one of this series (October 2008 DM Review) party data is the information entity that contains information about the party entity in the real world. There is the entity itself (the party in the real world), the information entity (the data maintained about the party) and the metadata about the information entity (the data about how the information entity is defined and organized). CDD means there is significant variance between information about the customer party in the information entity and the party itself in the real world.


"Fit for use" is the most commonly used term to characterize the quality of data of the information entity. It includes a subjective measure, "possess desired features," and an objective measure, "free from defects." "Possess desired features" measures whether the data is accessible, relevant and secure. It's a subjective measure because it is always a function of the business context in which the data is used. The same set of party data may possess all the desired features for marketing but fall far short of what's necessary for finance or service. On the other hand, "free from defects" is an objective measure of the data itself, regardless of use. The set of attributes, keys and relationships in a particular party data set can be objectively measured as complete and accurate. For example, the attribute values are or are not correct, with an explicit number of defects, at a point in time. The set of party keys has a known level of duplication.


Governance for Party Data Quality


Beyond the quality of party data and the extent to which it is fit for use for a particular business purpose, it's critical to look at an organization's capacity to manage data quality when exploring CDD. This is necessary because the condition extends beyond any single application, business function or database. According to IDC, the average company has 49 applications that operate on 14 different customer databases and, on average, no more than 20 percent of customer data resides in a single place.1 Understanding the capacity of an enterprise to manage party data quality becomes a critical component for a complete diagnosis of the current state. There are five dimensions to consider:

  • Executive commitment and organization,
  • Policy and procedure,
  • Business analysis and remediation,
  • Measurement and service level management and
  • Architecture and tools.

Managing Party Data Quality


In order to complete the background for diagnosis, let's review the functional components necessary to manage party data.


Standardize: The beginning of the process to reconcile party data from disparate sources without common definition or keys is to rigorously process all the attributes that describe people and legal entities. Standardize means to strongly type or assign correct labels to all the piece parts of name, location, contact methods and other attributes associated with people and legal entities.


Match: The assignment of a group key to more than one instance of a person or legal entity found in source data is accomplished through a matching and record linkage process. This is a well-understood statistical problem when working with data where there is no reliable key available to link records from different sources that describe the same entity in the real world.


Enrich and survive: The set or group of records that all describe the same person or legal entity then needs to be processed to best characterize the entity. Enrichment extends the view of the person or business by appending data from a reference source. For example, D&B and Acxiom are often used to extend the view of legal entities and individuals. Survivorship is the process to combine the data from a group of records into a single best representation for a particular business purpose.


Manage hierarchies and matching exceptions: The matching process has three outcomes from the comparison of two records. The two records describe the same entity in the real world (match), or the records describe different entities (no match). The third case is the gray area in between, often referred to as clerical review. These are cases where there is enough evidence that the two records may represent the same party, and the process is to have a businessperson review the information and make a decision. The use of clerical review is often driven by business context - not necessary for marketing promotions, but essential to consolidating financial accounts or patient records. Another area for active business participation is the management of hierarchies. Hierarchies are special types of relationships between parties that enforce parent-child relationships between records with a designated ultimate parent at the top.


Process updates: The output of this continuous process to manage party data quality needs to be posted to the database management system that stores all the data about the information entity. Process updates include the creation, update and deletion of data related to parties.


Report and manage quality: This includes formal reporting and the governance processes to understand, improve and monitor the quality of party data.




Party Data Quality in CRM


Poor data quality is often cited as a problem that prevents organizations from getting the full value from their investment in CRM. In all cases, the best place to start with diagnosis is to determine if any formal way to measure quality is in place.

  • Are there regularly applied measurements for completeness and accuracy?
  • Are reports compiled and distributed?
  • Is there a defined process to prioritize issues and manage resolution?

The next area to examine is relationship detection and matching.

  • Is there a well-understood definition of what constitutes a customer?
  • Are there definitions for customer types and relationships between parties?
  • Are the rules for the matching process defined? Can the rules for the matching process be accessed and changed?
  • Are there measurements that describe the accuracy of the matching process?
  • Is there a written definition of how data about the same party is merged together from different sources?
  • When external reference data is appended, is information available about the accuracy of the enrichment process?
  • Is there an interface for businesspeople to review matches and formally manage relationships between parties, including hierarchies?
  • Do the people who use the party data believe the data is relevant and accessible?
  • Does the CRM business function participate in enterprise data governance?

Party Data Quality with BI Applications and the Warehouse


While CRM applications often have the formal responsibility to manage the party data object itself, the party data management process is often invisible to the warehouse. But at the same time, reporting, business intelligence (BI) applications and the warehouse environment depend upon party data for primary dimensions. Dollars, units, geography and time are always denominated by some dimension of party. The customer, member, vendor, distributor, supplier or other party dimension is made up of aggregations based on group keys assigned to sets of attributes that describe instances of parties. Does an inaccurate party dimension impact the quality of the report, the model and the analysis of information from the warehouse? Often, no one understands the quality of the party dimension. There may be no formal standardization, matching and survivorship. Are there quality measurements? Reporting on quality and content? Exception processing? Connection to a governance process?



Party Data Quality in the Enterprise


Diagnosing CDD means understanding the horizontal alignment of the critical components required to manage party data quality. Remember, the fundamental root cause is the large number of party data sets serving different business functions and, often, different brands and geographies. A framework that calls for alignment across sources provides the background necessary to create a meaningful diagnosis.


Executive sponsorship is a must-have for the funding, business ownership and governance necessary for success. Focusing on the party data sets themselves offers a very concrete roadmap for diagnosis and often reveals the information necessary to gain executive sponsorship. Just start with a simple question, such as "In how many places do we store information about the people and legal entities we do business with?" Focus on a handful of sources, and then diagnose the current state for each party data set.

  • Look for formal mechanisms to gauge or report on the quality of the data or record linkage process.
  • Interview the people who use the data and find out whether they think it is fit for use and possesses the desired features.
  • Identify where the standardization and matching logic is actually executed and determine the degree to which the logic is exposed and configurable.
  • Look at the business process responsible for managing the party data and confirm the formal definition of customer, customer relationships and hierarchies.
  • Look into how data is formally managed for create, read, update and delete and the logical model that should be enforced.
  • Identify processes to handle exceptions and established methodologies and process for governance.
  • Cross-reference evaluations across sources.
  • Identify gaps and speculate on benefits.

With a clear understanding of the current state, you'll be in an excellent position to develop and prioritize treatment plans.




1.    John F. Gantz. "The Expanding Digital Universe: A Forecast of Worldwide Information Growth through 2010." IDC, March 2007.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access