This is the second correspondence on last week’s Predictive Analytics World (PAW) in San Francisco. About a year and a half ago, I wrote a book review on Super Crunchers by Yale economist Ian Ayres, in which I noted that super crunching as the amalgam of predictive modeling and randomized experiments. Randomization to treatment and control groups allows investigators to minimize the risk of study bias so that the only important differences between groups out of the gate are that one is named treatment while the other is called control. Predictive modeling by itself allows analysts to infer relationships and correlation; the addition of experiments sharpens the focus to cause and effect. The combination of predictive modeling and experiments is thus a very potent tool in the business learning arsenal of hypothesize/experiment/learn.

The power of analytics plus experiments was understood well by PAW participants. Conference chair Eric Siegel noted the importance of experiments in demonstrating the value of predictive modeling, citing the oft-told story of Harrah’s Entertainment that “not using a control group” is rationale for termination. Siegel also detailed the champion/challenger experimental analogy used by enterprise decision management practitioners.

SAS’s Anne Milley improved her standing with me quite a bit with a short but incisive presentation. Anne’s just now starting to get over an unfortunate remark on the risk of using the open source analytics platform R in a January NY Times article.

In this talk, she quotes Derek Bok, president of Harvard University from 1970-1991: “If you think education is expensive, try ignorance”. Anne proceeds to frame predictive analytics in a broader context of applying scientific principles to business. This framework for business analytics is one of:

  1. Observe, Define, Measure
  2. Experiment
  3. Act

She also proposes an Analytics Center of Excellence to promote dialog between producers and consumers of analytics, sagely noting that the social is every bit as important as the analytical, and that data quality is king. Sounds like someone who’s been around the modeling block more than a few times.
John McConnell of Analytical People discusses the popular CRISP-DM (CRoss-Industry Standard Process for Data Mining) methodology in his study of customer retention. The steps of the CRISP-DM feedback loop include Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment. Randomized experiments or other rigorous designs are part and parcel of the evaluation step.

Jun Zhong, VP Targeting and Analytics, Card Services Customer Marketing, Wells Fargo, uses randomized experiments as well as propensity adjustments for his response modeling so he can distinguish re-active purchasers from pro-active and non purchasers to best allocate scarce targeting dollars.
 
Finally, Andreas Weigend, former Chief Scientist of Amazon.com is a big proponent of the scientific method for learning in business. His talk, The Unrealized Power of Data, articulated a methodology, PHAME, for measuring the power of data. Weigend’s approach,
Problem-->Hypothesis-->Action-->Metrics-->Experiments, supplements top-down problem definition, hypotheses formulation and evaluation metrics with the bottom-up performance measurement of experiments in a learning feedback loop. Tom Davenport would be proud.