MAR 1, 2006 1:00am ET

Related Links

Visiting Nurse Service Cares About Cloud Security
October 25, 2011
Light at the End of the Silo
October 28, 2010
Pitney Bowes Releases Enhancements to MapInfo Professional
September 13, 2010

Web Seminars

6 Key Things to Fast Track your Mobility Strategy
February 23, 2012
Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
Dashboards: How's Business? Ask your Data!
March 15, 2012

Data Mining Tools: Which One is Best for CRM? Part 3

Print
Reprints
Email

Part I and Part II of this series presented the proper business and knowledge management context in which to evaluate data mining tools. In this article, I will evaluate six data mining tools for CRM data mining according to capabilities drawn from that context. The usual approach to evaluating data mining tools is to compare them according to various model performance criteria. This evaluation will take a different path - analysis of the ability of each tool to facilitate the data mining process for various customer relationship management (CRM) models for retention, cross-sell/up-sell and customer acquisition.

Performance comparisons among data mining tools may lead evaluators astray; they speak only to the relative ability of the algorithm to build a model with the data input to it. But, where does the data come from? Ah, that is the rub. It has been said many times by many people that data mining is as much art as science. The majority of this "art" is in the preparation of the data to input to the algorithm rather than focused on the tuning of the algorithm itself. The reason for the need for a large amount of data preparation prior to data mining is that most databases and warehouses are not structured in the manner needed to provide data quickly and efficiently. Business data base management has turned almost exclusively to relational database management systems (e.g., Access, IBM DB2, Oracle, Teradata).

Relational data structures are optimized to provide efficient storage and retrieval of information in report format. These data structures (e.g., third normal form) scatter data all over a system of storage media accessible via SQL. This architecture works very well for transactional systems but not very well for analytical systems. Multidimensional data structures are far more effective in providing data for analytical operations. But, even these systems require significant processing to provide analytical data in a timely fashion.

The proper support of ongoing data mining operations in relational or even multi-dimensional data environments requires a variety of data extraction, integration and transformation capabilities to support data mining. The most useful data mining tool may be the one that provides most of these capabilities, rather than the tool that builds the most predictive model.

Following that approach to data mining tool evaluation, six data mining tools were compared and evaluated.

** Relative costs for a one-to-five user standalone version. Actual costs of enterprise use are defined by the number stations, number of features/add-ons purchased, and other factors.
*** Cost declines rapidly in client/server mode with unlimited clients.

Table 1: Data Mining Tools Included in the Evaluation

The evaluation consists of two parts: 1) a table with asterisks for each tool for each feature and function necessary to support CRM data mining operations and 2) a description and discussion of each tool. The importance of the features and functions may differ from those other reviewers might assign.

Comparative Analysis of Features and Functions of Six Data Mining Tools

For specific applications, readers are urged to select those features and functions that important for analysis of specific data sets, and add up the weighted scores for that subset. For example, if your legacy data is spread out over many platforms and storage systems with dissimilar formats and codings, you might select the tools that include data extraction, data integration and data transformation preprocessing capabilities. You could place emphasis (or none at all) on the richness in the variety of algorithms available. On the other hand, if your data is largely in one database on one system, your concern about data integration features may be relatively low. The bottom line to keep in mind is that there is no single reason for conducting data mining investigations so likewise there is not one perfect fit of software tool to diverse quantitative scenarios. I have tried to present the features, functions, strengths and weaknesses as objectively as I could. But, this discussion and the following table (Table 2) suffer from an internal bias accountable to the fact that I have more experience with SPSS-Clementine that any other tool. With that caveat, this table might serve as a guide for evaluation and comparison rather than as an end in itself.

Table 2: Evaluation Guide

SAS

Base SAS and SAS/STAT have become standard analytical tools in many business and scientific applications. SAS began offering statistical software in the late 1970s. The software was built around the SAS programming language and various "canned" procedure programs (Procs) that could be called by the language. This system was very flexible and permitted the statistical analysis of virtually any application. Early statistical analysis packages (e.g., the UCLA Biomedical Data Package - BMDP) provided many canned procedures but very little flexibility beyond direct application of the procedures. I used the BMDP Multiple Linear Regression procedure to analyze my Ph.D. dissertation data in 1971. After fighting through my statistical analysis training in the era of the machine calculator (1960s), I thought I was in hog-heaven with the computer and BMDP. My major problem, though, was the necessity to learn Fortran-II in order to use BMDP. SAS changed all that by incorporating the flexible procedural language with the canned procedures to permit an integrated analytical environment.

In the 1980s, SAS continued to gain steam and acceptance in the scientific arena (where I worked at that time). In the 1990s, SAS successfully penetrated the business and governmental markets. The result today is a very large number of people that have been trained on the use of SAS in analytical operations in many sectors of science, business, industry, and government. SAS data sets (around which all SAS operations revolve) became a standard data repository for many uses. This standardization permitted the spread of SAS into many parts of a company. Therein lies the most valuable aspect of SAS today: the wide selection of analytical offerings including forecasting, simulation, operations research, experimental design, quality control and hundreds of statistical algorithms are delivered in one platform.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.