Free Site RegistrationFree Site Registration

Sign up today and access Information Management on the web!
Your FREE registration entitles you to:

FREE email newsletters

FREE access to all Information Management content

FREE access to web seminars, resource portals, our white paper library and more!

How to Choose a Data Mining Suite

Information Management Special Reports, March 2004

Robert A. Nisbet

The choice of a data mining suite is not an easy task. This article provides a brief outline of some considerations that could affect your decision. Contrary to common opinion, the best tool suite for you may not be the most advanced tool, it may not be the one with the most data mining algorithms nor the one that gives the greatest accuracy in prediction. More important than all of these things is identifying the tool suite that is:

  • Easy to use,
  • Provides acceptable accuracy (even though not the highest accuracy available),
  • Able to perform all the common tasks in a data mining project.

Ease of Use

Some traditional (and heavily advertised) data mining tools may provide a rich variety of data processing and modeling capabilities, but require a legion of "priests" to use them. Often, these "priests" of data mining are developed only after many years of practice and travel up the learning curve of the tool's capabilities. Rather than be very procedural (programmed with a scripting language), the user interface to data mining technology should be like the interface to automobile technology. The great success of the automobile is because it brings the benefits of sophisticated engineering technology down to the level of use appropriate to the common man and woman. You don't need to be an expert in internal combustion technology or understand the complex relationships between gear ratios and acceleration to use a car effectively. All you have to do is get behind the wheel, turn on the ignition, step on the gas and steer or use the brakes at the appropriate times and, voilą, you are an expert user of the automobile! Using a data mining tool should be like that. You might be surprised to learn that several modern data mining tool suites approach that ease of use.

Accuracy: How High?

Suppose you could buy a tool for a fraction of the cost of the priestly tool (maybe 20 percent or less), which permitted ordinary business analysts and statisticians to create models that were 80 percent as good as the priest could create. Would you choose to buy the priestly tool, or the 80 percent tool? I think for most companies, the answer is the 80 percent tool. For this case as well, several data mining tool suites available today can provide this functionality. For other purposes, the best tool might be the most accurate tool.

Advertisement

Ability to Perform All Common Data Mining Tasks

Most data miners will tell you that 70-90 percent of the time required to perform a data mining project is spent in data preparation for modeling. Reasons for this include:

  • Most data mining algorithms require clean and complete data records as input. No data mining tool in the world can analyze data that does not exist (missing data in some fields).
  • Most data in commercial databases was collected from transactional systems to serve query and reporting purposes, not analytical purposes.
  • Most data in commercial databases are rather "dirty." That is, databases often contain inappropriate data, training data, improperly input data or just plain garbage data. Even if the data appears to be clean, historical data records may reflect changes in coding and aggregation rules at various times in the past, which must be reconciled. In addition, data formats may not be consistent across databases used as data sources. Finally, data may require transformation to different ranges or different expressions (letters changed to numbers), or new variables may be needed that are combinations of existing variables. A good data mining suite will provide tools for performing all of these operations. Some data mining suites are better at it than others.

This article reviews five of the most useful and powerful data mining suites available, STATISTICA Data Miner, SPSS Clementine, Affinium Model, Insightful Miner and KXEN. We can use these tools to illustrate how you can evaluate how suitable a data mining tool suite is for your use. Let's cut to the chase right in the beginning.

There is No Best Tool Overall

There is no best tool overall. Are you surprised? Well, competition in the marketplace almost guarantees this. If a particular tool is successful enough to make it into the mainstream of data mining use, it must serve at least a moderate segment of business needs well. Each tool suite has its strengths and weaknesses; each tool suite may be the best for particular needs in particular companies. Each of the five tool suites will be reviewed and classified according to their best uses. From this evaluation, you can gain enough information to take the first step in the choice of the data mining tool suite that is right for you.

The first step is to look at the features and functions of the data mining tool suite. While this only first step in the decision-making process, it may not be the most important consideration for you. Figure1 shows a weighted comparison of the features and functions of the tool suites. You will notice by comparing the relatively moderate cost and the weighted score across all features and functions, that STATISTICA Data Miner is the clear winner. This does not mean that this tool is best for you. Your needs may not require (or your budget may not permit) the rich variety of capabilities provided by STATISTICA Data Miner; Insightful Miner (with its great ease of use and affordability) may be just the right tool for you, regardless of its relatively low score in Figure 1. Or, you may want a fully automatic data mining engine that can generate models of the very highest accuracy, to which you are willing to submit data in the suitable format. If so, KXEN is the right tool suite for you, providing the cost is acceptable. Clementine and Affinium Model tool suites provide intermediate solutions between those of KXEN and Insightful Miner, in terms of functionality and cost.

Page 1 of 3.

Advertisement

Advertisement