Some of the feedback I received on the April column showed that you were working with the ideas presented. Capturing point-in-time quartile rankings in a ranking dimension was a good start to understanding customers, but obviously this could go a lot farther.

The ranking dimension contains customers and household quantiles in various categories that are important to such your ongoing customer segmentation needs. If you followed the post-load calculation scenario presented in the April column, you would populate this dimension rather than the customer dimension with the quartile data and keep the quartiles over time so that the all-important changes in various overall category and channel spending and profitability levels could be tracked. This, combined with the detailed transaction fact information may ultimately yield most "best, good, poor" segmentations for customer relationship purposes from an internal perspective.

However, to round out the CRM- readiness of your customer data, you need to know your customers as individuals beyond their spending with your company. What age bracket are they? Hobbies? Income? Attitude toward spending? Do they own a home? Where? How long?

In order to gain this kind of knowledge, you must either collect it directly from the customer or purchase it from outside suppliers. These days, we don't need to bother our customers to gather information from them. We can get it on the back end through the reverse-append. An increasing percentage of data warehousing users will be external to the company in a few years. Just as data warehousing is turning its focus to customers and supply chain partners as the customers of the data warehouse, so too data warehousing must turn its focus to input that comes from outside of the company.

Many of our clients have a majority of their data coming from external sources. Some data warehouses are nearly 95 percent populated with data from external sources. Some industries have a consortium that clears most of the purchasing behavior to the upstream constituents. If you are in consumer goods, such consortiums are essential to gain this knowledge unless your downstream distributors and retailers directly capture and share this data with your company.

You don't want to make business moves based on the assumption that the external data is 100 percent correct. If I don't really own a dog ­ even though empirical evidence suggests I might (I bought some dog food for a neighbor and/or I'm in the age, income and neighborhood bracket of likely dog owners) ­ a targeted dog food promotion would be wasted on me.

Generalizations to balance the breadth and quality of the data are made throughout the process of obtaining the data, making this an imperfect science. Consider the tradeoffs that the data vendor has made in their consolidations (in breadth, depth and accuracy) in terms of your intended use of the data.

Breadth is the match rate on your data. For example, how many of your current customers and prospects will the vendor have data on? How many of the transactions for your products will the vendor have? Depth is the degree of data or amount of information that can be matched against your data. For example, how much data on your current customer set is available? Can the vendor tell you about their household, spending, age, lifestyle preferences, Web behavior, etc.?

Accuracy has to do with how the data is compiled. Is it compiled from loose cues such as a singular purchase or from solid, long-term evidence? Any model may be appropriate for your use; just be sure you know the data attainment method and frequency of update.

Other tips for success in an external data world include:

  • Ethical and privacy issues sometimes cause the summarization of external data above the transactional levels to generalizations. Make sure the generalizations are detailed enough to be actionable.
  • Set up a service level agreement for file delivery from the vendor. Similar to what would occur for all data warehouse feeds, the external feeds should be provided in a predetermined format at a predetermined (usually FTP) location at a particular time.
  • Have a plan in case the file doesn't show up. If its load will impact the data warehouse measurably, don't allow it to happen during your query window. Wait for a more quiescent time to load the data.
  • Map the data into the data warehouse architecture. Although the data is often most relevant to a particular business department, to make the most of the data bring it into a position that leverages the data as well as enriches it with other corporate data.
  • If you're using multiple external data providers for the same dimension, you will need to deduplicate the various vendor feeds. This can be a very time-consuming step. Consider the manageability issues associated with this. Though no vendor's data is perfect from a quality standpoint, many provide adequate depth for the customer dimension.

Next month, I'll discuss how to evaluate the external data that you are considering for your CRM-ready data warehouse.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access