Data quality is a critical issue that appears on the path to successful completion of all customer relationship management (CRM) projects. Let's begin by defining CRM and reviewing common CRM initiatives. Modern CRM initiatives involve communicating with the customer on a one-to-one basis for marketing, support and other interactions. In order to do this, customer information and transactional behavior must be collected and stored in one centralized data mart or warehouse for analysis and reporting.

Call center systems, Web analysis tools, ETL tools and reporting tools are often marketed as complete CRM solutions; however, the majority are not standalone solutions. The CRM warehouse is usually an online analytical processing (OLAP) schema that pulls data from multiple operational online transaction processing (OLTP) systems. The number of customer touchpoints a company has will determine the systems from which a full CRM implementation must collect data. These can include phones, e-mail, Web sites, direct mail and brick-and-mortar locations. The systems that support these touchpoints are usually diverse and not integrated within the company. You may have all of these touchpoints with separate systems supporting each functional division of your organization and, therefore, much redundancy and potential for error.

CRM initiatives can face many data quality issues including:

  • Legacy data. A lot of the data on legacy systems does not have referential integrity or other constraints common to more modern relational databases.
  • Identifying the customer. How easy is it for you to identify customers and their behaviors? Do you have a loyalty program? Do you collect e-mail addresses? Do you use cookies on your Web site?
  • Privacy issues. How well do you respect your customers' privacy and wishes on the marketing offers they opted out of?
  • Linking customer behavior across channels or products. How well do you link customers' Web site behaviors with service calls?
  • Differing granularities. Do you receive some data in weekly totals and other data monthly? Do you obtain some information based on social security numbers and other data based on account numbers?
  • Fields with similar names are not using similar business rules or domains. Does your organization label or tag fields consistently (i.e., a "do not solicit" flag may be labeled a 1/0 in one system and Y/N in another)?
  • Fields don't actually contain data.


The more trouble you have identifying customers and collating information about how they behave, the greater the risk that your profiling and segmentation will be misguided, your marketing campaigns will miss their targets and the return you get from your investment in the CRM system will not meet projections.

The following is an example. On average, Debbie Smith purchases one widget from you per month; however, last month she purchased three widgets, making three separate transactions. She was identified as Debbie, Deborah and Deb Smith throughout the three transactions. If, as a company, you failed to identify this, then you have at least two pieces of misinformation in your database. First, Debbie will look like an average customer rather than a profitable customer. Second, you will have two additional customer records in your database. These kinds of errors can lead to bad business decisions about Debbie and those like her, and your customer segmentation may become blurred.

Some of the best ways to address these issues simply can't be covered in a short article; however, following are a few ideas to get you started.

Data Quality Review

For almost any CRM project, you should analyze, in an automated way, the exact makeup of the data you are collecting. You should do this on full data sets and not just a subset of the data. It is not accuracy you are interested in here; it is consistency of format (representation), completeness and domain integrity. This review should take place before you rely on, and/or place any faith in, the data. It will also allow for generation of criteria for properly segmenting the customer base and determining what percentiles, deciles, quintiles, etc., will be most appropriate for each attribute depending on how its values are actually distributed throughout the data.

Identifying the Customer

There is usually not a single customer identifier across a corporation and many systems do a poor job of preventing duplicate records that are created for the same customer. Different transactional systems for CRM data usually have their own customer tables or mechanism for identifying customers (e.g., brick-and-mortar stores use loyalty cards, Web sites use cookies, phone centers use caller IDs and online communications use e-mail IDs). Many companies with data quality solutions also provide name and address cleansing and householding.

Legacy Data

Most legacy data is provided in fixed-width or delimited flat files, which usually lack meta data. The initial file definition must manually be entered into your extract, transform and load (ETL). Coding changes are necessary when the file formats change. Often, changes are made on source systems that should not impact the data feed. However, sometimes these changes are made without regression testing at the time of the change, which will affect the data load.

Fields with Similar Names Are Not Using Similar Business Rules

By conducting a data quality analysis, you will be able to identify issues before they occur. To properly explore and address this, a business process model and an enterprise data model should be created with the help of data users.

Your project team may think they are familiar with your data files, having worked with them for a while and having done some spelunking. Your team may have documented, in excruciating detail, the data format and domain of values that should be in any given field. Additionally, to your knowledge, all errors and issues have been found and corrected. Speaking from experience, I can tell you that you are only aware of a small fraction of issues that have contaminated your data. Don't wait until the testing phase to discover that your data is not coming together as you had hoped.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access