Everyone agrees that data transformation is important for creating and maintaining a high-value data warehouse. In simple terms, data transformation is the process of collecting, cleansing, reformatting, validating and consolidating data and is crucial for business users. But data transformation is merely the process--the means to an end. The resulting information provided by a data transformation process must be high quality. High-quality data will be used by the intended users. They will trust the data enough to use it for making high-value business decisions. Trust of the data is critical. Consider the multi-divisional company that developed a data warehouse to provide integrated cross-divisional customer information so that a new customer relationship initiative could be launched. Eighteen months and a couple of million dollars later the warehouse is complete, the applications start using the warehouse and the first customer analysis reports are produced. The company's CEO is particularly interested in a report that summarizes total revenue, across all divisions, by customer, by year. Armed and eager, the CEO visits one of his top 10 customers. The CEO greets his customer and thanks him for the $7 million in business they did last year. The customer appreciates the kudos but points out that it was $12 million. The embarrassed CEO returns to his company and demands that this error, and all errors like it, be found and corrected. The warehouse project team spends another year correcting data errors and regaining trust. Poor quality, "load-and-go" data transformation was the culprit.

Despite its recognized importance, data transformation continues to be partially or wholly neglected in businesses today. Lack of an integrated enterprise-wide approach to collect, structure and maintain data is often the culprit of poor data transformation and, in turn, the barrier to building a credible and successful data warehouse.

A major goal of any data transformation program that desires to create usable and trustworthy data, therefore, has to involve every aspect of the business that comes in contact with customer data.

An Integrated Approach

To effectively create an integrated strategy, businesses must first identify where data inaccuracies begin and examine the ways the data is currently stored and utilized.

Data Collection. Errors and inconsistent information start in the data collection process. Businesses gather data by interviewing customers and recording their answers or by collecting information supplied by customers through a survey or response-type form. Either way, disparate information is constantly being compiled based on how information is provided by the customer and translated by the business.

Consider the scenario of a typical telemarketing center conducting a customer survey for a major retail firm. The goal of the program is to compile personal information on customers who recently purchased a new children's toy from the company's catalog. The company wants to gather birth date and household information to uncover buying trends among its customers so it can effectively target special incentive offers to them.

During the calling, some customers provide their actual birth dates, others provide younger ages because they don't want to reveal their "real" age, while others provide no information at all because they believe it's a violation of their privacy. The customers in this example that were asked to give out their personal information were trying to understand the context of the question (What do they really want? Why do they want it? Am I in danger by revealing this information?). They then communicated the data they thought the toy company wanted as well as what they were comfortable disclosing--the content--to the company. This instant decision can cause problems in the accurate collection of customer data.

Belief in confidentiality is a prerequisite for trust and self-disclosure. To work in a trust relationship, only information directly related to the company's purposes should be obtained, and the least invasive method of recording used whenever possible. Lack of trust in the customer/company relationship can cause the customer to taint the data being revealed by them. The customer is making a decision about what data to disclose to us. They understand intuitively that divulging personal information is a release of power. It increases their vulnerability. If they are unsure about the ultimate uses of the information that they are revealing, they will resist--both actively and passively.

Taking it a step further, the telemarketers are encouraged to enter a substantial amount of customer records into the database to achieve quotas. In their quest to meet quotas, the quantity of calls becomes their top priority. The quality of the data, even if given accurately, takes a back seat and often becomes a casualty of typos, misspellings and even misinterpreted data.

Pinpointing where data inaccuracies originate, therefore, is the first step in creating a successful data transformation process. The second is examining the ways data is stored and utilized to maximize the value of the data.

Data Structure. Even if a customer provides accurate, complete information and a data entry operator correctly enters the data, it won't be useful unless it is brought to its most finite level, then standardized and formatted.

Many companies don't have a set structure for storing customer data. Oftentimes, the data is recorded at the "group" level, not the "sub-element" or most finite level. For instance, there may only be name, address, city, state and ZIP code fields set up in the database. Separate fields for first name, middle name, surname, etc., don't exist.

The data is also frequently stored in a "free-form" style with no consistent format between record files. This lack of structure makes it impossible for companies to perform finite-level analysis utilizing data elements such as city, state, surname, ZIP code, phone number, and so on. Because the data is not finite or in a standardized format, cross-validation of the data cannot be performed to consolidate customer information to effectively build a data warehouse.

While accurate, standardized and formatted customer data is essential for a successful data transformation process, these efforts will be wasted if the data is not kept current. The information will quickly lose its value if the third step--data maintenance--is not employed.

Data Maintenance. As soon as customer data is captured, it begins to age and becomes increasingly useless, further complicating the data transformation process.

Data becomes inaccurate or outdated as a result of lifestyle changes, population shifts and postal changes. Marriages, divorces, births of children, retirements and new jobs, for example, take place on a regular basis. Moves occur daily. According to the U.S. Postal Service, 17 percent of the United States population moves at some time during a single year. This represents a 17 percent error rate in the carelessly maintained customer data warehouse. ZIP codes, area codes and phone numbers are also continually changing and quickly aging database information.

Because of the extent of these events, many company databases are riddled with inaccurate or outdated information.

Problems that surface in each step of the data transformation process are key to maximizing the value of the data. Once the origins of data inaccuracies, data structure and data maintenance are understood, strategies for successful data transformation can be implemented.

How to Achieve High-Value Customer Data Transformations

The three distinct processes a company must focus on to achieve high-value customer data transformations are the data collection procedure, the data structuring procedure and the data content validation and maintenance procedures.

To ensure data is accurate from the start, you must have an effective data collection method in place. Here are some ways to ensure you collect high-quality customer data:

  • Pay attention to the "interview." You need to carefully construct questions that are clear and concise to help customers provide desired responses. The questions should be tested and modified until they effectively prompt customers for the right information.
  • Pay attention to any written form you ask a customer to complete. Be sure your form's questions are clear. Be sure it's quick to complete--be sensitive to your customer's time. Above all, be sure it collects the desired responses. And test the form!
  • To gain your customers' trust, explain why questions are asked and how the information will be used. Reassure customers that information will be kept confidential and not sold unless permission is received from them. Nothing hurts customer intimacy more than a company selling their customer information without permission. (By the way, a future trend will be royalty payments to customers who agree to the distribution of their personal information. The days of the free data gravy train will come to end.)
  • Train the data entry operators. They are critical to the success of high-quality data collection. It's critical to strengthen their ability to establish customer confidence and obtain desired information.

Next, you'll need to ensure that your data is structured correctly and consistently. These efforts will support finite-level data analysis and data consolidation projects when developing comprehensive customer profiles. For operational databases, technology exists that can help reengineer existing group-level customer data. One warning here is that actual content of the data is not being validated; rather the data is scrubbed and reformatted to achieve higher quality consistency and integration.

  • Take the customer data down to its lowest atomic level. This means the data must be structured at the sub-element level, not the group level. There are many products on the market today that can help you reengineer existing group-type customer data.
  • Standardize and format the data. Name fields should be broken out into first name, middle name, surname, salutation and suffix (such as "Jr."). Address fields should include the street name, directional (such as "N" or "SW"), building numbers, apartment and suite numbers and, of course, city, state and ZIP code (ZIP+4, preferably). Phone numbers should consist of area code, prefix and unique four-digit identifier.
  • Look for data that doesn't belong. Banks often store account information in name fields. While this is business critical data, it is not the customer's name. Also, look for commentary information in data, such as "Call after 5," in the phone number field. Create new elements to store this data.

Finally, here's how to ensure that your data is usable in advanced customer relationship management applications and to maintain the accuracy of your customer data.

  • Validate and maintain the basic address information. There are many products that can correct address information. While this doesn't solve the data aging problems or the name, phone number and birth date problems, you can at least get the address correct. This comes in handy for mass marketing mailings and some direct marketing campaigns. It is not really customer data (all you're really validating is that a postal delivery point is correct as defined by the post office). Many times a marketing piece will be addressed to a person who does not live at that residence anymore, but having some of the content correct is better than nothing.
  • Validate and maintain customer information using third-party data. There are a variety of data quality technology solutions for updating and correcting the content of customer data. Such sophisticated technology utilizes third-party database information to correct and update name information, verify residency and track moves, and verify important personal information such as marriages, births of children, income level, phone numbers, and so on. These solutions provide an unprecedented level of quality required for advanced marketing and customer retention strategies necessary for customer relationship management.
  • Provide your data collection operators with data validation tools. The data collection operators can also use customer information management and address management tools at the point of collection. Combining robust data collection methods with third-party, point-of-entry data quality tools is the most powerful solution to providing effective data transformation solutions.

The value proposition for high-quality data transformation is powerful as well. For example, consider one large retail company that, after initiating a comprehensive customer data transformation process, realized an additional $50 million of new revenues for every 1.5 million customers they processed. And these are customers the company currently does business with. The additional revenues resulted from simply knowing their customers better and leveraging this information to provide their customers with more goods and services. The high-quality customer data provides reliable information that can be used for any number of advanced customer relationship management applications.
Until enterprises have a data transformation process in place, an involved integrated approach will be needed to collect, structure and maintain customer data to manage customer relationships. Businesses must develop processes to effectively collect data at the point of entry, record the data at its most finite level, then standardize and format it and, lastly, maintain the data for accuracy. Those companies that make data transformation a relentless priority will build the critical foundation necessary for a credible and successful data warehouse and, ultimately, prosperous relationships with customers.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access