Continue in 2 seconds

MineSet Reloaded

  • Herb Edelstein, Janet Millenson
  • January 01 2004, 1:00am EST

MineSet was one of the better data mining packages of recent years. Developed by Silicon Graphics, Inc. (SGI), it achieved modest success with more than 150 customer installations. Unfortunately, SGI's main line of business is high-performance servers, not data mining software. As their hardware sales started to decline, they deemphasized MineSet and eventually closed it down.

Happily, some ex-SGI people have struck a deal with SGI to revive MineSet. Headed by managing director Gareth Lane, the new company, Purple Insight, is working closely with SGI to do this. Lane was direct sales manager for SGI in the United Kingdom.

Purple Insight is establishing support relationships with existing customers as part of maintaining MineSet and has said that they will enhance it as well.

Because approximately 50 percent of MineSet's installed base is in the U.S., they are in the process of becoming a U.S. company in the Silicon Valley area. They intend to focus on the market segments where SGI has had some success in the past including fraud detection, money laundering, network intrusion detection, pharmaceuticals research and network traffic analysis.

Lane thinks they can succeed where SGI had failed because their business is data mining software and associated services; they are not a hardware company that uses data mining software as a vehicle for selling computers. Their sole focus on data mining will better allow them to meet market demands. Furthermore, as the amount of stored data has grown dramatically in the last few years, MineSet's parallel architecture (which allows it to scale) will be a big asset. Lane also believes that Purple Insight's team of private investors will provide the capital and stability needed to reach profitability by late 2004 or early 2005.

I also spoke to Ronny Kohavi, currently director of personalization and business intelligence for Amazon, who was head of engineering for MineSet under SGI. Prior to joining Amazon, he was head of analytics for Blue Martini. Kohavi remains a fan of MineSet. At Blue Martini, he was involved in an evaluation of data mining tools that resulted in their decision to license and incorporate MineSet technology in their products. Not surprisingly, he would very much like to see MineSet succeed in the market, but he is concerned about whether Purple Insight will have the technical resources necessary to maintain and enhance it.

Purple Insight is currently shipping MineSet 3.1, the last released version, and is expecting to ship version 4.0 during the first part of 2004. In addition to supporting the IRIX operating system for SGI computers and Microsoft Windows, they are adding Linux support. They are also improving the API (application programming interface) to simplify embedding MineSet in applications.

MineSet originally grew in part from SGI's expertise in visualization. Consequently, it has a wide range of interesting and useful (and, in some cases, stunning) graphics to help people understand data. The original mining algorithms were based on the MLC++ library developed at Stanford University. To this, they added a preprocessing and data transformation engine.

Users can employ either a command line interface or a GUI (graphical user interface) called the Tool Manager. Tool Manager divides the window into four areas for organizing a data mining project. The data source is specified at the top, and the status of the project is displayed at the bottom. The bulk of the window is divided into a data selection and transformation pane on the left and a data visualization and mining pane on the right.

The transformation engine provides for a wide range of automatic transformations as well as a scripting language for coding more complex transformations. A useful history button shows the process flow of transformations to the original data.

However, MineSet attracted the most attention with its dazzling visualizations that took advantage of the SGI graphics library. The collection of visualizations helps make MineSet an excellent exploratory tool. The Tree Visualizer, first seen in the original Jurassic Park movie, allows you to fly over a decision tree and inspect what is happening at each node. Splat Visualizer is a scatter-plot tool designed to deal with very large data sets. The Evidence Visualizer shows how each column influences the response variable. Other exploratory visualizers include those for associations, decision tables and clustering.

Integrated with these visualizers is a strong set of analytics, including clustering, decision tables, classification trees, regression trees, clustering, Naive Bayes and automatic attribute selection. This integration makes the visualizers more useful for data exploration than simple graphics tools. I've always thought that SGI underemphasized the value of its analytic algorithms and, in particular, their contribution to the visualizations.

MineSet is a worthy contender in the data mining tools market, and it's good to see it back.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access