Darwin Excels in Complex Data Mining Projects for Nautilus Systems

  • October 01 1999, 1:00am EDT

BACKGROUND: Nautilus Systems, Inc. is a business and computer consulting firm focusing on data mining and data warehousing. Nautilus Systems builds 1:1 marketing and customer relationship management solutions for industries including healthcare, financial services, telecommunications and government. In addition to competitive research and analysis skills, Nautilus Systems has developed unique methodologies to enable automated analysis of the large volumes of data generated within and external to organizations.

PLATFORMS: Windows NT/95/98 client, UNIX server environment. Sun Solaris and HP-UX servers support both single and multiprocessor (SMP) environments. We used an NT client and UltraSPARC II server with two 250 MHz processors.

PROBLEM SOLVED: Nautilus Systems was contracted by the U.S. Air Force to determine whether the right components were deployed in the F-15 wartime-readiness spares kits for a required 30-days' support. The goal was to maximize operational readiness and minimize offline servicing. Nautilus Systems implemented a data warehouse of maintenance and operational data and then applied Darwin (formerly of Thinking Machines Corporation, now acquired by Oracle) data mining software. Darwin found a previously hidden correlation between the replacement of a landing strut and the failure of an "O" ring on a hydraulic bypass valve from 10 to 100 hours later. Changing the service manual to include the "O" ring service with the strut service would increase operations availability of that aircraft. Darwin rapidly enabled us to help this vendor ensure its product's viability, minimize loss and keep its consumers flying.

PRODUCT FUNCTIONALITY: Darwin accessed both ASCII and RDBMS data using MERANT DataDirect (formerly INTERSOLV) ODBC drivers. Darwin Release 3.5 supports three data mining algorithms (neural networks, classification and regression trees and k-nearest neighbors) based on the correct belief that there is no best algorithm and that a variety of algorithms is necessary to build accurate models. Darwin uses Microsoft Excel for graphing data mining results and Microsoft Internet Explorer for online help. Users familiar with the Windows NT/95/98 user interface will have no difficulty navigating Darwin. Additionally, Darwin has six wizards that assist in building models: Text Import, Database Import, Missing Values, Key Fields, Model Seeker (automatically builds multiple models and selects the best) and Model Compare. Darwin 3.6 will add k-means clustering and database writeback.

STRENGTHS: Darwin's client/server architecture combines an intuitive Windows interface with Darwin's parallel, scalable architecture and multi-algorithmic approach. The ability to rapidly access large databases yields better models for more accurate results. Unlike many other data mining products, Darwin uses all available data, without preconceived notions of which portions will be most relevant, finding patterns that are often otherwise missed. Darwin's wizards as well as workflow and scripting features simplify and automate the data mining process. Tuning options still provide a high level of control to expert users. Darwin's deployable models enable easy integration with applications at customer touchpoints.

WEAKNESSES: Darwin's current visualization is bound by Microsoft Excel but is easily augmented by using other data visualization tools. Interfacing directly to third-party visualization tools would make this an outstanding data mining product.

SELECTION CRITERIA: We chose Darwin for its spectrum of algorithms, quality of models, ease of use (especially modeling wizards), high speed and ability to handle very large data sets. Darwin is very easy to use but is best leveraged by subject-matter experts with large volumes of clean data and with well-defined problems to address. Darwin excels against large, complex problems that seek discovery of unknown relationships or consumer behavior. Darwin's exportable models move these discovered patterns to external actionable applications, a capability especially important for customer relationship management.

DELIVERABLES: Wizards, interactive tree display, lift charts, sensitivity analysis, ROI and margin graphs, error tables and decision tree rules are easy to use. Exportable models (as C, C++ and Java code) are easily deployed for campaign management or call center integration.

VENDOR SUPPORT: The vendor provides phone support and on-site training and professional services. Support personnel were knowledgeable and very responsive to us.

DOCUMENTATION: User manuals are available; however, Darwin's online help answers most questions.

