I was quite excited when my partner sent me the briefing documents from our newest customer, an eMarketer. Im generally vigilant for opportunities to do analytics and predictive modeling, and so was heartened by the inclusion of two slide decks pertaining to propensity models and scoring. I guess I have a propensity for statistical analyses!
Simple Research Designs for Business Intelligence
Statistical models are generally developed in the context of research designs that allow results to be established with more or less confidence. The tighter the design, the more assurance the analyst can have of the findings. The gold standard for establishing the validity of an investigation is, of course, the randomized experiment. Randomization to treatment helps assure that observed differences in performance variables between experimental and control groups are due to the intervention and not to other uncontrolled factors (covariates), either observed or unobserved, that might be related to the performance measures and, subsequently, be sources of bias. With randomization, those bias-causing, uncontrolled factors should, on the average, be a wash between intervention and control groups.
At a minimum, business intelligence (BI) practitioners should understand the strengths and weaknesses of the designs they deploy to gather intelligence. Consider the six simple designs often used for BI outlined in Figure 1: where O represents observation or measurement, X is a treatment or intervention, and R stands for randomization. Design 1a, the one-shot case study, which offers no possibilities to learn from comparisons or overtime contrasts, is really not much of a design at all.Yet this design is, unfortunately, quite pervasive in BI, underpinning much of predictive modeling, and a significant foundation for findings that impact business decision-making. The one group pretest-posttest 1b provides at least a pre-post comparison of the investigation units (customers, stores, etc.). The main problem with 1b is that differences in the pre and post measurements might be due to factors other than the intervention and this design is hard pressed to refute alternative explanations.
Both pure experimental designs 2a and 2b should be standards by which BI aspires to gather intelligence. The power of randomization of units to either intervention or control groups, along with the benefits of pre and post measurements, make these simple designs well able to withstand threats to the validity of inquiries. And, in the Internet age, its often pretty straightforward to execute simple randomized experiments that can assure the quality of results.
For those cases in which randomization is impractical or inappropriate, quasi-experimental designs 3a and 3b, supplemented by statistical adjustments for bias, might be acceptable substitutes. Designs 3a and 3b introduce a next level of complexity to pre-experiments by adding a comparison or control group to the analysis. Indeed, quasi-experimental designs look much the same as their pure experimental cousins, except that they use natural groups instead of randomization to intervention/control. Without the benefits of randomization, selection and other biases can distort findings, misleading analysts to conclude there are differences between intervention and control, when in fact the groups are different (there are biases) out of the gate.
There are many flavors of propensity models in the BI world today, each associated with one or more of the designs in Figure 1. Historically, marketing has equated propensity to predictive models that assess customer probability or likelihood of executing a critical event, such as purchase of goods and services. They speak of propensity (or inclination) for up sell and cross-sell. They trumpet lift, which is actually bang for the predictive buck, the hope being that a relatively small and predictable group of prospects makes the lions share of purchases. With significant lift come cheaper modeling, superior predictive accuracy and noticeable marketing ROI.
The marketing propensity model is associated with pre-experimental 1a and 1b, or their methodological cousins that first intervene and then observe multiple times. Most often, the models attempt to gauge the probability of membership in a desirable group, such as purchasers, or perhaps an undesirable group, such as credit card abusers. Logistic regression has long been the preferred modeling technique for marketing propensity analysts. More and more, however, other models and machine learning algorithms that classify membership - such as linear and quadratic discriminant analysis, trees, random forests and boosting algorithms - are being applied to the classification problem. Once accurate predictors are identified and the models refined, prospects can be scored and handled differentially by the business.
The more current work with propensity models offers support for quasi-experimental design 3a. In this case, propensity is concerned, not with final group membership like above, but instead with adjusting for potential biases caused by the absence of randomization to treatment and control groups in evaluations of performance. The hope is that effective statistical adjustment in the analysis can provide much the same benefit after the intervention as randomization does before.
The propensity methodology for design 3a first attempts to predict membership in the treatment or control groups from a series of observed covariates thought to influence the final performance variables of interest. The intention is to summarize the differences between treatment and control - the potential biases - in a single score that can be used to cleanse or adjust comparisons between groups. If the predictions of treatment versus control group membership are well-behaved and have significant overlap, the propensity scores can be used to either match or adjust differences statistically. Such adjustments can, in many cases, provide results as precise as randomized experiments.
An example from the excellent statistical text by Maindonald and Braun provides an informative illustration of propensity modeling for quasi-experimental designs.3 The authors analyze data from an evaluation study that attempts to discern whether or not a labor training program was effective in raising the wages of participants, contrasting wages of those that participated in the program with a control group of nonparticipants. While the original study included randomization to treatment and control, the authors attempt to replicate findings using a nonrandomized control group to illustrate the propensity modeling technique for design 3a.
Using an accessible data set and the open source R Project for Statistical Computing, Maindonald and Braun detail the progression of statistical thinking from data exploration, to simple regression modeling without propensity adjustments, to the use of propensity scores. Their conclusion: An analysis that naively contrasts wages for treatment verses control shows a significantly positive program effect, while a comparison that is moderated by the use of propensity scores shows little impact after adjustment for differences between the nonrandomized groups. The use of propensity scores in addition to treatment categories in the model has statistically dampened the impact of the intervention.
Advances in propensity score modeling also include the handling of quasi-experimental design 3b, pretest-posttest control group. Called the difference-in-differences method, this extension uses propensity scores developed as with 3a, but contrasts the pre-post differences of the experimental group with the pre-post differences of controls. Variants of this approach are now widely used to evaluate the effectiveness of social and educational programs. With strong support in statistical software packages, the difference-in-differences method is a solid choice for measuring business performance over time, between groups, when randomization is not an option. It should become a staple in the BI analysts arsenal.
- Donald T. Campbell and Julian C. Stanley. Experimental and Quasi-Experimental Designs for Research. Rand McNally College Publishing: 1963.
- Marshall M. Joffe and Paul R. Rosenbaum. Invited Commentary: Propensity Scores. American Journal of Epidemiology, August 15, 1999.
- John Maindonald and John Braun. Data Analysis and Graphics Using R - An Example-Based Approach. Cambridge University Press.: 2007.