welcomes Guy Creese as a monthly columnist. He is will share his more than 25 years of expertise in this Volume Analytics column that focuses on ways to optimize your Web sites via best practices in content management, search, personalization and Web analytics.

As standalone capabilities, the pattern-finding technologies of data mining and text mining have been around for years. However, it is only recently that enterprises have started to use the two in tandem - and have discovered that it is a combination that is worth more than the sum of its parts.

First of all, what are data mining and text mining? They are similar in that they both "mine" large amounts of data, looking for meaningful patterns. However, what they analyze is quite different.

Data Mining

Data mining looks for patterns within structured data, that is, databases. The underlying technologies are based on statistics and artificial intelligence, littering the field with buzzwords such as classification and regression trees (CART), chi-squared automatic induction (CHAID), neural networks and genetic algorithms. As a process, data mining is not for the uninitiated. Typically, a statistician selects the appropriate algorithm(s) for the business problem, prepares the data for analysis and then fine-tunes the model based on the results.

Even though the process is labor-intensive, it can have significant payoffs. For example, enterprises can use data mining to understand what "clusters" their customers fall into (and plan accordingly) as well as save money by sending catalogs only to those customers with a high propensity to buy.

Text Mining

Text mining looks for patterns in unstructured data - memos and documents. Consequently, it often uses language-based techniques, such as semantic analysis and taxonomies, as well as leveraging statistics and artificial intelligence. Like data mining, you don't just press a button and have magic happen. Depending on the technology used, sometimes documents need to be "tagged" - an editor may need to manually note what the document is about. At other times, a text mining system may need to be "trained" to recognize a certain type of document. In this case, a person familiar with the content would need to collect a representative set of documents to be input to the system.

Also similar to data mining, text mining can discern patterns that have significant business value. Companies can use text mining to find overall trends in their trove of bug reports or customer complaints, for example.

Put Them Together and You Get High Value

Recently, vendors such as Intelligent Results, SAS and SPSS have started to recommend to their customers that they combine data and text mining. And the results have been interesting, to say the least.

This is not surprising, for two reasons. First, the enterprise has vastly expanded the universe in which to find patterns - always a good thing. Secondly, a pattern in data or text can amplify or clarify patterns in its counterpart. In both cases, there is a multiplier effect going on.

But rather than being theoretical, let's be specific. Collections and recovery departments in banks and credit card companies have used duo-mining to good effect. Using data mining to look at repayment trends, these enterprises have a good idea on who is going to default on a loan, for example. When logs from the collection agents are added to the mix, the understanding gets even better. For example, text mining can understand the difference in intent between, "I will pay," "I won't pay," "I paid" and generate a propensity to pay score - which, in turn, can be data mined. To take another example, if a customer says, "I can't pay because a tree fell on my house;" all of a sudden it is clear that it's not a "bad" delinquency - but rather a sales opportunity for a home loan.

By using data mining and text mining in tandem, enterprises have been able to improve average "lift" over using just one technology to around 20 percent, with the range being from 5 to 50 percent. Other areas where duo-mining has paid off include analyzing product wish lists, open-ended survey questions and customer attrition patterns at cell phone companies.

Some Practical Hints

Companies looking to do duo-mining in such applications need to be wary of several things, especially in regards to text mining. First, some text mining technologies need large amounts of text to analyze - several page memos, for example - while call logs are sometimes just snippets in comparison. Second, "stemming," a popular technique in text analysis in which various forms of a word are distilled into one word - "pay," "paid," "will pay," "won't pay" = "pay" - may need to be turned off. To take the collections example, stemming would prevent the enterprise from understanding the customer's intent. Therefore, companies need to ensure that the technology they're using is tuned to the problem at hand.

In addition, some companies' solutions are more toolkit-oriented (SAS and SPSS) while others are more application-oriented (Intelligent Results). Which is more appropriate depends on what the company wants to do and the level of in-house expertise.

With those caveats in mind, enterprises should investigate duo-mining. It's a combination of two time-tested technologies that can lead to big payoffs.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access