Predictive Analytics: Data Mining with a Twist

  • December 01 2005, 1:00am EST

With the exponential explosion of business data and the accelerated market dynamics, more decisions must be made in compressed time frames. Data mining and its sibling - predictive analytics - now provide a potential avenue to meet this pressurized demand for real-time decision making. Although data mining has been a defined solution space since the 1990s, it is only during the last two years that the data mining process has been enhanced to create embedded predictive analytics. Predictive analytics builds on the data mining multistep process and statistical modeling techniques to add a layer of automation and self-directed built-in intelligence. Business users (and not just Ph.D. statisticians) can now analyze large amounts of customer, supplier, employee and product data for patterns and trends.

Depending on who you talk to, the time of day and the product space being promoted, the predictive analytics definition can be as narrow as "understanding why a variance occurs in real time and what to do about it moving forward" or as broad as "delivering the right insight to the right people in real time for making decisions." In our situation, we are focused on the relationship of the dependent key performance indicators (KPIs) and the associated independent causal variables. It is all about the ability to automatically discover KPI variances and determine the root causes so that improved predictors of a company's business performance can be developed, measured and managed. Predictive analytics is the automated process of sifting through large amounts of data using statistical algorithms and neural nets to identify data relationships between KPIs and critical measures that facilitate prediction of critical success factors.

Data Mining Framework

From a simplistic perspective, predictive analytics automates the data mining process and adds enhanced capabilities such as real-time data capture on the front end and automatic alerting on the back end. Let us first discuss the two major data mining constructs - SEMMA (from SAS) and CRISP-DM (from other data mining industry leaders) - and show how they can be integrated into the predictive analytic framework.

The SEMMA construct starts out with sampling the population data to create a manageable set of data for analysis and then explores the data visually to determine what types of patterns and trends can be found. Next the data is manipulated to insure data completeness and quality. In some cases the data is bucketized into meaningful classes and enriched with demographic, attitudinal and behavioral data. Modeling uses appropriate statistical modeling techniques such as regression analysis, neural nets, tree-based reasoning and time-series methods to uncover the root causes of the patterns and trends. The final step is assessment where the models developed using the initial training data are compared to the holdout sample to determine the effectiveness of the forecasting models.

The CRISP-DM reference model is broader in context and starts with a business understanding that includes a focus on business goals/objectives and a project plan. Both data collection and exploration have been combined into a module called data understanding. The next step - data preparation - incorporates activities such as data formatting, data cleansing and data integration. The final two modules focus on evaluation of model results and deployment of the models into production.

Predictive Analytics Process

The bottom line is that both the SEMMA construct and the CRISP-DM reference model converge on key activities such as data capture, modeling, analysis, evaluation and deployment. These provide the core elements for a predictive analytics framework. The trick with predictive analytics is to create a seamless environment so that the data collection through model development and deployment are self-directed and untouched by human hands. This includes the six steps illustrated in Figure 1. After the initial source data mapping, the data collection process can proceed with current data incorporated in the updated KPI forecasting models. The KPI business models can be developed with tuned coefficients, and the results automatically evaluated to quantify the "miss" versus "actual." Simulations and what-if analysis can gauge the impact of alternative business scenarios. The results are communicated in real time via e-mail alerts and scorecards/dashboards beacons. Success of the process lies in the ability of the models to forecast the future rather than just predict the past. Leading rather than lagging KPI forecasting models are the mantra for success.

Figure 1: The KPI Predictive Analytics Process

Although more of the software vendors are offering embedded predictive analytics with extensive "under the covers" modeling solutions for market basket analysis, fraud detection and affinity analysis, the first-cut models need to be reviewed by individuals steeped in the art and science of statistics. There are just too many assumptions and caveats associated with data mining techniques that could lead a business analyst down the slippery slope of misinterpretation. Once the initial models are validated, self-directed and automated updates using embedded predictive analytics can certainly enhance the agility and effectiveness of the decision-making process.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access