MAR 1, 2006 1:00am ET

Related Links

Visiting Nurse Service Cares About Cloud Security
October 25, 2011
Light at the End of the Silo
October 28, 2010
Pitney Bowes Releases Enhancements to MapInfo Professional
September 13, 2010

Web Seminars

6 Key Things to Fast Track your Mobility Strategy
February 23, 2012
Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
Dashboards: How's Business? Ask your Data!
March 15, 2012

Data Mining Tools: Which One is Best for CRM? Part 3 Continued

Print
Reprints
Email

To see the first part of this article, click here.

Insightful Miner

The next tool in order of overall usefulness (Link to Table 1), Insightful Miner, follows naturally after Clementine for another reason: This tool has the best selection of ETL functions of any data mining tool on the market. These functions include:

  • Merging, appending, sorting and filtering (similar to Clementine and Statistical Data Miner)
  • Slicing and dicing of input data for purposes of data exploration. These functions are common in database management and business intelligence (BI) tools, but it is very rare to find them in a data mining tool.
  • Joining: creates a new data set by combining columns of two other data sets.
  • Stacking and unstacking: creates a new column by combining two or more columns and vice versa.

The only other data mining toolset with greater ETL capability was Torrent Orchestrate. Torrent Orchestrate was purchased by Ascential Software in 2001. IBM acquired Ascential recently.

The rich ETL capability of Insightful Miner is mated with a graphical programming interface, similar to that of Clementine, Statistica Data Miner and SAS-Enterprise Miner. In addition, many useful algorithms are integrated as analysis nodes (neural networks, classification & regression trees, logistic regression and naïve Bayes models).

Insightful Miner provides a number of very valuable capabilities for data mining. Firstly, it is built around the S statistical language, providing a rich statistical analysis and graphics capability via the menu-based S-Plus implementation of the S language. The statistical abilities of Insightful Miner rival those of Statistica Data Miner in their power and completeness.

In addition, Insightful Miner is built on a pipeline architecture that permits easy scaling to analysis of large data sets. This means that the data analysis algorithms operate in streaming mode using incremental forms of statistical analysis (e.g., based on provisional means and standard deviation, etc.). For many data miners, this may not be important. But if you must analyze large data sets from massive data warehouses, this feature can be very important. It means that you don't have to extract data to external data sets; you can stream data directly from the source data structures through the analysis algorithms and build solutions incrementally. The only other tool that can do that is Statistica Data Miner (the first tool to provide this capability), and one algorithm in Fair Isaac's Model Builder tool (not reviewed here).

The other extremely useful part of Insightful Miner is the model evaluation tools. This tool outputs both a coincidence (confusion) matrix and percent classification accuracy for both output states. Often it is very useful to know how accurate the model is for the positives and negatives separately, rather than report just the global accuracy. Like the model comparison node in Statistica Data Miner, the Lift Chart node in Insightful Miner will accept multiple inputs to produce an overlaid lift chart. But, this tool does not provide for a final classification based on a voting process among the algorithms. Finally, an overlaid ROC chart (receiver operating characteristic) can be output from multiple inputs. The area under the ROC curve represents the classification power of an algorithm; by reporting how it performs with different cut-points along the range of classification, probabilities create the binary classifications.

These features are orchestrated by Insightful Miner to provide an analytical platform that is configurable and extendable throughout the business enterprise. It is scalable and can grow as your data analysis needs grow. The very flexible S-Plus framework provides a powerful and extensible programming environment. Maybe best of all, Insightful Miner provides perpetual licensing without annual rental agreements.

The Future for Insightful Miner?

The scalable architecture of Insightful Miner should be leveraged to create Tool Kits for analyzing massive data sets. As data sets increase in size, traditional data mining tools become less and less efficient for analysis. Two approaches to analytical scalability can be followed: parallelism and streaming-mode operation. Large hardware parallel systems of IBM and NCR are very expensive. If you don't need parallelism to efficiently process storage and retrieval of massive data sets for other operational purposes, streaming-mode operation is the way to go. It is possible to build streaming-mode versions of machine learning programs also (e.g., neural nets and decision trees). These capabilities could become the "killer-apps" in the world of data mining of massive data sets.

KXEN

In the past, KXEN stood alone in many respects. It was the only implementation of statistical learning theory, it was highly automated and it minimized the amount of data preparation necessary before modeling, and the predictability of the algorithm was highest among competitors in many cases. This situation is still largely true today, but the gap is narrowing.

KXEN is composed of several modules:

  • K2C - Consistent Coder
  • K2R - Robust Regression
  • K2S - Smart Segmenter
  • KSVM - Support Vector Machine
  • KTS - Time Series
  • KEL - Event Logger
  • KMX - Model Export
  • KAR - Association Rules
  • KXEN Assistant - Menu-based Interface

All of these modules are available through the menu interface, and they can be packaged separately or in bundles. For example, SmartFocus (based in Bristol, UK) offers a suite of smart marketing support packages, including SmartModeler (composed of KXEN K2C and K2R). In fact, the primary focus of KXEN is to provide other companies with embedded data mining capabilities. This business model can't fail to win in the future. Data mining must become function-based, rather than tool-based. The analytical functions necessary to support mining of nonlinear data sets must become integrated into the very structure of other software tools, similar to that of arithmetic operations. Business users of standard vertical industry tools must be able to take data mining tools for granted. Today, there is as much art as science in data mining. This is due primarily to the structure of the analytical tools. Yes, there are problems in every data set that must be solved, and wrinkles that must be smoothed before running the algorithm. But, many of these issues can be handled automatically, at least from a theoretical standpoint. The trick is to invent automated tools that perform the same operations as humans perform or obviate them. KXEN does both, to a great extent.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.