Continue in 2 seconds

Application of Data Mining Techniques

Published
  • August 01 2003, 1:00am EDT

Data mining, as I understand the term, implies a process that is somewhat unfocused. It involves the application of algorithms to large quantities of data. Thousands, perhaps millions, of variable combinations are examined to discover (it is hoped) some specific patterns in the data that are useful. (These are often call "nuggets" of information, to extend the mining analogy.)

In this process, the analyst takes a neutral stance regarding where the useful patterns will be found. Instead of testing specific variable combinations, the entire data set is searched systematically, insuring that nothing is missed. Similarly, a miner might dig up a whole mountainside looking for gold, not knowing exactly where the nuggets will be found. (Fortunately, the analogy between data mining and real mining breaks down when considering the ecological impact.)

Thus, there is an "unfocused" nature of data mining in the application of any specific data mining technique to a data set. However, in at least two important ways data mining is a highly directed technique that requires judgement and experience on the part of the analyst.

First, the analyst must decide what data mining technique should be used on which data. If done correctly, it is useful to mine for gold in South Africa and to drill for oil in Saudi Arabia – but not vice versa. Based on the marketing objectives of his or her analysis, the analyst must collect the best data and choose the analyses most relevant to those objectives. Occasionally an analyst will be asked to perform some very general task, such as: Take all my data and see if there is anything interesting in it. But it is more common to be working with a more specific objective, such as: See if my data can tell me what products I should offer to my current customers.

Second, among all the data patterns examined by the data mining algorithm, the analyst must specify how to identify those that are most useful. This is accomplished by specifying statistical criteria that are related to usefulness. Often, this is combined with an element of judgement applied by the analyst or marketing executive.

Let’s look at an example related to the objective mentioned above: Can my data tell me what products I should offer to my current customers? Assume that this question has been asked by an office supply store called Paper Clips Plus.

First, it seems fairly clear that the best data for this analysis would be product purchase histories of individual consumers. This information will probably only be available for purchases made at a Paper Clips Plus store. Nevertheless, it seems reasonable that by examining past purchase histories it will be possible to find some purchase patterns that suggest a way to target product offers to customers.

One data mining technique used for this purpose is market basket analysis. Briefly, this looks at combinations of products and determines if there is a tendency for certain products to be purchased together. The usefulness of this analysis is based on the hypothesis that: 1) if many customers buy both products A and B then, 2) some customers who just buy product A may also benefit from product B and, therefore, 3) these customers will be more likely to respond to an offer of product B.

A typical market basket analysis would look at all combinations of products, and characterize them based on several statistics. The most common are:

  • Support – The percent of customers who buy product A.
  • Confidence – The percent of customers buying product A who also buy product B.
  • Lift – The amount by which the probability of purchasing product B increases if the customer buys product A (expressed as a ratio).

A high lift indicates that an affinity exists between products A and B. High levels of support and confidence indicate that the affinity applies to enough customers to (perhaps) make an offer worth while. How to combine the values of support, confidence and lift to rank the top product combinations is a matter of judgement.
Judgement is also important in deciding whether the nature of any specific product affinity is likely to lead to a useful offer. For example, suppose it is found that customers who buy staplers are very likely to buy staples. Of course, it makes sense for a salesperson when helping a customer pick out a stapler to mention, "Don’t forget to buy staples with that." But this affinity is so obvious that your marketing client is unlikely to feel that you have added any value when you present them with this result.

On the other hand, suppose it is found that customers who buy a computer are more likely to buy a computer desk. Again this is not surprising – some might even say it is obvious. But getting a desk to go with a computer is not required in the way that a stapler requires staples. Thus, a specific offer (say a direct mail piece targeted at recent computer purchasers) may influence customers to make a purchase they otherwise would have put off.

Data mining, then, is not a technique that can be applied without discretion to solve multitudes of problems. It requires a thoughtful application of the right technique to the right data.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access