In part 1 of this series (November 2002), I advocated the rollout of data mining to the average, casual user and gave some high-level techniques where day-to-day value could be easily understood. This month, I'll take those techniques and more specifically demonstrate their valuable use in different business situations.

Deviation: This algorithm uses a statistical method to highlight interesting data anomalies (deviations from expected).

What if you knew that in a cluster of high-spending block groups, there was a severely underperforming block group on a per-capita basis? You might overmarket to that group to see what kind of potential it had. You might look for statistically deviant characteristics of that group. You might give up on marketing to it at some point, thus saving your money.

Summary: Summary takes out relativity and returns the highest-valued combinations of selected dimensions such as top 10 (or top N), bottom N, greater than X, less than Y, etc.

For example, what combination of dimensions to sales (at any level of a dimensional hierarchy) yields the five largest profit values? If you knew vegetables at store 123 in February 2002 was the top combination of product category-store-month in the last three years, you would want to understand why and then attempt to repeat that success for other product categories in other stores in other months.

With deviation and summary, Compaq (now HP) was able to ascertain patterns in product improvement opportunities and focus their development on systemic fixes to problem areas. They were able to mine a large swath of data such as customer satisfaction, market research, call center, manufacturing and warranty repair information. This solution enabled Compaq to cut reaction time to field issues from seven weeks to four, increase service event satisfaction by 20 points within the first 12 weeks, and help eliminate millions of dollars in service and warranty expenses.

Market Basket: This is one of the most familiar types of analysis for marketing, the purpose of which is to determine what products customers purchase together. A store could use this information to physically colocate products. Products that have a strong affinity can be placed physically apart, creating a walking route past those products with a weaker affinity and hopefully creating the opportunity for consumer temptation to buy the weaker-affinity products. Products with a weaker affinity can be placed physically proximate, because it's unlikely a shopper would cross the store for the item with weaker affinity.

Or, market basket can be used to simply create the most convenient shopping experience possible by ignoring the marketing aspects and just outright placing products that have strong affinity physically together – a practice of Kohl's department stores.

Association Rules: This algorithm automatically discovers patterns of associations between items from different dimensions (multidimensional rules). That is, it finds which members from different dimensions appear together at the same time in the database with some degree of support and confidence.

For example, with respect to sales, if customers' yearly income is $10k to $30k, then their education level is partial high school with 20 percent support and 93 percent confidence. Support means that 20 percent of the sales are made to customers who have yearly incomes of $10k to $30k and an education level of partial high school. Confidence means that in 93 percent of the sales to customers with yearly incomes of $10k to $30k, the customers will have a partial high-school education. This rule could have significance if I determined that the partial high-school customer profile had a significant profit potential and I wanted to do some target marketing. However, if I don't have any demographic information concerning education level but I do have demographics related to incomes, I can use the knowledge of the affinity (predictability) between income and education to indirectly target my partial high-school customer on the basis of income rather than education level.

Probability Rules: Probability rules allow you to make logical "reaches" with assumptions about your data. They allow you to "impute" missing data into the data warehouse and therefore extend the range and reach of your knowledge.

For example, if you are missing automobile preference for a set of newly acquired third-party prospect data and you have age and gender, by mining your current data you could create a rule with high probability as follows: If customer age = 35 and customer gender = female, then automobile preference = SUV. You can then impute that preference onto those customers when preparing promotions for the SUV preference profile.

The technique naming standards used in this two-part series has been adopted from PolyVista, a leading vendor for mining with SQL Server Analysis Services and another advocate for bringing data mining to the front line. While not an extensive treatise on mining algorithms, hopefully this has provided some food for thought as you consider the usage of your data warehouse and how uncomplicated it may be to provide access to deeper levels of analysis.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access