Predictive Analytics - Algorithm Nirvana

  • March 01 2006, 1:00am EST

Predictive analytics consists of two major components - advanced analytics and decision optimization. Decision optimization will be shelved for future discussion while we focus on advanced analytics, with its comprehensive portfolio of sophisticated statistical techniques and data mining algorithms. In my January 2006 column, a framework was presented to characterize business challenges into five major analysis categories: classification, clustering, association, estimation and description. The next phase aligns model types with the appropriate business analysis with a final drill down to the specific algorithms.

Alignment of Model Types

Figure 1 provides a comprehensive view of the relevant model types and algorithms. This framework ensures that business analysis needs drive the solution methodology rather than the choice of algorithms dictating the business analysis options. This avoids the problem of "algorithm nirvana," where myriad analysis options can lead to an analysis-paralysis syndrome. The profile for each of the model types below discusses respective assumptions and constructs. To maintain consistency throughout the discussion, the term case will be used to describe a basic entity of information such as a specific customer, object or activity with all their associated attributes such as demographics, purchase behavior and market drivers.

Figure 1: Model Types and Algorithms for Predictive Analytics

Descriptions of Model Types

Classification assigns cases to previously defined groups by identifying the attributes that characterize that specific group. The distinct advantage of classification is that the groups are predefined. Model types:

Decision trees use flow-chart tree structures consisting of internal nodes (attribute tests) and branches (outcome of tests) to develop classification rules for splitting cases into homogeneous groups. The explicit if-then rules are easy to generate and to interpret (e.g., CART, CHAID and C4.5).

Memory-based reasoning identifies similar cases from experience and applies the information from those cases to the new cases to select group classification. Distance and combination functions drive the classification criteria (e.g., k-nearest neighbor).

Bayesian classification uses a combination of conditional and unconditional probabilities to create posterior probabilities that predict group membership. Bayes' theorem directs how to update or revise beliefs in light of new evidence (i.e., naive Bayes).

Clustering segments cases into groups that are very different from each other, but whose group members are very similar. Clustering is an example of undirected learning that discovers appropriate groups and forms descriptions for each group where no preclassification exists. Model types:

Partitioning-based analysis uses an iterative approach that attempts to improve the clustering by moving cases from one group to another based on centroids, medoids and weighted measures criteria (e.g., k-means, k-medoids, expectation maximization).

Hierarchical-based analysis uses either an agglomerative approach which starts with each case forming a separate group and then successively merging into larger groups or a divisive approach which starts with all cases in the same group and then splits them into smaller groups (i.e., BIRCH, CURE).

Sequence discovery uses the sequence in which cases are selected, ordered or executed to find clusters of cases that contain similar paths in that sequence (e.g., Markov chain analysis).

Association finds patterns and trends across a large number of case transactions that can be used to understand and exploit customer behavior. The different model types can then be used to analyze case transactions at one point in time or transactions viewed over time. Model types:

Market basket analysis uses the likelihood of different products being purchased together in a case transaction to develop rules and confidence levels for predicting future purchase decisions (e.g., association rules).

Time series analysis uses time series elements such as trend, cycles, seasonality and lagged variables to develop models and predict future values for individual cases (e.g., exponential smoothing, Box-Jenkins).

Link analysis algorithm uses the process of building up networks of interconnected cases through relationships to expose patterns and trends. Visual mapping is an important tool for highlighting relationships (i.e., directed graphs).

Estimation forecasts future case outcomes by developing relationships between an outcome variable and one or more causal variables.

Regression analysis is used to predict the values of a response variable (dependent) from one or more predictor (independent) variables. The form of the equations depend on the nature of the relationship - linear, nonparametric, polynomial, etc. (e.g., linear regression, logistic regression).

Neural networks use a combination of input, hidden and output weighted nodes where input attributes are related to predictable attributes through either one-way or bidirectional flows. The topology of the networks vary, and interpretation of the results is often difficult to explain (e.g., feed forward, backward propagation, genetic algorithms).

Description analyzes the case data by using a variety of profiling, summarization and visualization techniques.

Exploratory data analysis provides summary, profiles and visualization plots of the transactional data (e.g., frequency histograms, box-and-whisker plots, Pareto curves).

Dimensionality reduction allows two or more correlated attributes (variables) to be expressed by a single factor that is a weighted composite of the original variables (e.g., principal components, factor analysis).

Future columns will illustrate in detailed case studies how these methods can be leveraged to meet the variety of marketing, sales, financial and operational business challenges.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access