Data quality is a critical issue that appears on the path to successful completion of all customer relationship management (CRM) projects. Let's begin by defining CRM and reviewing common CRM initiatives. Modern CRM initiatives involve communicating with the customer on a one-to-one basis for marketing, support and other interactions. In order to do this, customer information and transactional behavior must be collected and stored in one centralized data mart or warehouse for analysis and reporting.
Call center systems, Web analysis tools, ETL tools and reporting tools are often marketed as complete CRM solutions; however, the majority are not standalone solutions. The CRM warehouse is usually an online analytical processing (OLAP) schema that pulls data from multiple operational online transaction processing (OLTP) systems. The number of customer touchpoints a company has will determine the systems from which a full CRM implementation must collect data. These can include phones, e-mail, Web sites, direct mail and brick-and-mortar locations. The systems that support these touchpoints are usually diverse and not integrated within the company. You may have all of these touchpoints with separate systems supporting each functional division of your organization and, therefore, much redundancy and potential for error.
CRM initiatives can face many data quality issues including:
Legacy data. A lot of the data on legacy systems does not have referential integrity or other constraints common to more modern relational databases.
- Privacy issues. How well do you respect your customers' privacy and wishes on the marketing offers they opted out of?
Linking customer behavior across channels or products. How well do you link customers' Web site behaviors with service calls?
Differing granularities. Do you receive some data in weekly totals and other data monthly? Do you obtain some information based on social security numbers and other data based on account numbers?
Fields with similar names are not using similar business rules or domains. Does your organization label or tag fields consistently (i.e., a "do not solicit" flag may be labeled a 1/0 in one system and Y/N in another)?
- Fields don't actually contain data.
The more trouble you have identifying customers and collating information about how they behave, the greater the risk that your profiling and segmentation will be misguided, your marketing campaigns will miss their targets and the return you get from your investment in the CRM system will not meet projections.
The following is an example. On average, Debbie Smith purchases one widget from you per month; however, last month she purchased three widgets, making three separate transactions. She was identified as Debbie, Deborah and Deb Smith throughout the three transactions. If, as a company, you failed to identify this, then you have at least two pieces of misinformation in your database. First, Debbie will look like an average customer rather than a profitable customer. Second, you will have two additional customer records in your database. These kinds of errors can lead to bad business decisions about Debbie and those like her, and your customer segmentation may become blurred.
Some of the best ways to address these issues simply can't be covered in a short article; however, following are a few ideas to get you started.
Data Quality Review
For almost any CRM project, you should analyze, in an automated way, the exact makeup of the data you are collecting. You should do this on full data sets and not just a subset of the data. It is not accuracy you are interested in here; it is consistency of format (representation), completeness and domain integrity. This review should take place before you rely on, and/or place any faith in, the data. It will also allow for generation of criteria for properly segmenting the customer base and determining what percentiles, deciles, quintiles, etc., will be most appropriate for each attribute depending on how its values are actually distributed throughout the data.
Identifying the Customer
Most legacy data is provided in fixed-width or delimited flat files, which usually lack meta data. The initial file definition must manually be entered into your extract, transform and load (ETL). Coding changes are necessary when the file formats change. Often, changes are made on source systems that should not impact the data feed. However, sometimes these changes are made without regression testing at the time of the change, which will affect the data load.
Fields with Similar Names Are Not Using Similar Business Rules
By conducting a data quality analysis, you will be able to identify issues before they occur. To properly explore and address this, a business process model and an enterprise data model should be created with the help of data users.
Your project team may think they are familiar with your data files, having worked with them for a while and having done some spelunking. Your team may have documented, in excruciating detail, the data format and domain of values that should be in any given field. Additionally, to your knowledge, all errors and issues have been found and corrected. Speaking from experience, I can tell you that you are only aware of a small fraction of issues that have contaminated your data. Don't wait until the testing phase to discover that your data is not coming together as you had hoped.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access