Data Mining – If Only It Really Were about Beer and Diapers

Register now

At my job, we use data mining tools in order to figure out what the heck is really going on. Data mining has been around for quite some time now. About 10 years ago it was even considered by many BI vendors to be the "next big thing" after ad hoc querying and OLAP tools. Who has not heard about the beautiful example of the supermarket that wanted to know what product they sold first and foremost with diapers? Well, they mined the database that stored all the customer transactions and, to their big surprise, it turned out that beer was the product most often sold with diapers. On top of that, these purchases were made mainly on Friday afternoons by men between the ages of 25 and 35. After some serious thinking, the supermarket figured out the rationale was that because diapers are voluminous, the wife, who in most cases made the household purchases, left the diaper purchase to her husband who had the car. The husband and father, most often between 25 and 35 years old, usually bought the diapers at the end of the working week. With the weekend, beer often becomes a priority; and so, beer became the product most often associated with the sale of diapers.

What did the supermarket do as a consequence? They put the beer display next to the diapers. The result was that the fathers buying diapers and who also usually bought beer now bought even more beer, as it was so conveniently placed next to the diapers. The ones that did not buy beer before began to purchase it when it was so visible and handy - just next to the diapers. Beer sales skyrocketed.

This story exists in several different versions and sometimes it is about 7 Eleven, sometimes about Wal-Mart. Sometimes it is not even about data mining, but about the benefits of data warehousing. It is a nice story for promoting data mining, but with the risk of disappointing many data mining fans, it would seem that it is not true. I have yet to be told this story by someone who was actually there and not by someone who heard it from someone who knows someone who seemed to have been there.

Even if the beer and diapers example may not be true, it is somewhat surprising that data mining has not really taken off as was predicted. The science is mature. Some of the data mining algorithms that are commonly used today were created 30 years ago, and data mining software has been around for quite some time. In other words, there are relatively stable products around. Also, some of the solutions offered no longer demand that the end user has a Ph.D. in advanced mathematics in order to use them (and to understand why many men like beer). So why is it that data mining has not had the breakthrough in the BI market? I mean, look at it: who does not want automated solutions that can tell you what is actually going on? So what if the data preparation is a major issue or that you need some skills in order to handle a data mining tool, the efforts in implementing a data warehouse are far bigger. And the users that can efficiently handle ad hoc querying tools or OLAP solutions do not exist in abundance either.

You could figure out that beer is the preferred product with diapers, or whatever, with reporting tools alone. In such a case, the user does, however, need to know in advance to look for such possible relations. Data mining can automate all this. (Who does not want a convenient life where someone or something else does the job? Who would not be lazy if only it was possible?)

At the same time, it appears that organizations that actually use data mining are reaping huge benefits. These companies most often find themselves in highly competitive markets, such as telecommunication, big volume retail or banking. Just imagine what hidden relations could be uncovered and used for improving the business. What if a mobile phone service finds out that there is an increase of phone calls from their married customers to other married customers at very odd hours? This could be translated into some really interesting and innovative business opportunities, such as an offer to hide such dialed numbers from the detailed phone bill. You know, even things that might be considered immoral by some, do sell. If you do not believe this, some data mining analyses could prove the point and therefore convince you.

Even though data mining will not find all the truths and business opportunities, it can and does find examples similar to the beer and diaper connection. Even if it may not be true (just think about it: which supermarket has actually put their beer shelf next to the diapers?), maybe supermarkets really should start to market beer and diaper together. That would make a truly good story true.

For reprint and licensing requests for this article, click here.