MAR 27, 2008 3:57am ET

Related Links

Innovative Organizations Likely to have More Pervasive BI and Data Governance
September 2, 2014
Revolutionize Your Business Intelligence with Lean, High-Performance Solutions
August 21, 2014
Should You Always Obey Orders from Your Executives?
August 7, 2014

Web Seminars

Why Data Virtualization Can Save the Data Warehouse
September 17, 2014
Essential Guide to Using Data Virtualization for Big Data Analytics
September 24, 2014

A Propensity for BI


I was quite excited when my partner sent me the briefing documents from our newest customer, an eMarketer. I’m generally vigilant for opportunities to do analytics and predictive modeling, and so was heartened by the inclusion of two slide decks pertaining to propensity models and scoring. I guess I have a propensity for statistical analyses!


Simple Research Designs for Business Intelligence


Statistical models are generally developed in the context of research designs that allow results to be established with more or less confidence. The tighter the design, the more assurance the analyst can have of the findings. The gold standard for establishing the validity of an investigation is, of course, the randomized experiment. Randomization to treatment helps assure that observed differences in performance variables between experimental and control groups are due to the intervention and not to other uncontrolled factors (covariates), either observed or unobserved, that might be related to the performance measures and, subsequently, be sources of bias. With randomization, those bias-causing, uncontrolled factors should, on the average, be a wash between intervention and control groups.


At a minimum, business intelligence (BI) practitioners should understand the strengths and weaknesses of the designs they deploy to gather intelligence. Consider the six simple designs often used for BI outlined in Figure 1: where O represents observation or measurement, X is a treatment or intervention, and R stands for randomization. Design 1a, the one-shot case study, which offers no possibilities to learn from comparisons or overtime contrasts, is really not much of a design at all.  Yet this “design” is, unfortunately, quite pervasive in BI, underpinning much of predictive modeling, and a significant foundation for findings that impact business decision-making. The one group pretest-posttest 1b provides at least a pre-post comparison of the investigation units (customers, stores, etc.). The main problem with 1b is that differences in the pre and post measurements might be due to factors other than the intervention – and this design is hard pressed to refute alternative explanations.


Both pure experimental designs 2a and 2b should be standards by which BI aspires to gather intelligence. The power of randomization of units to either intervention or control groups, along with the benefits of pre and post measurements, make these simple designs well able to withstand threats to the validity of inquiries. And, in the Internet age, it’s often pretty straightforward to execute simple randomized experiments that can assure the quality of results.


For those cases in which randomization is impractical or inappropriate, quasi-experimental designs 3a and 3b, supplemented by statistical adjustments for bias, might be acceptable substitutes. Designs 3a and 3b introduce a next level of complexity to pre-experiments by adding a comparison or control group to the analysis. Indeed, quasi-experimental designs look much the same as their pure experimental cousins, except that they use natural groups instead of randomization to intervention/control. Without the benefits of randomization, selection and other biases can distort findings, misleading analysts to conclude there are differences between intervention and control, when in fact the groups are different (there are biases) out of the gate.


{{eval var=$ImageLine assign="Image1"}}{{capture assign=zk_imgcap_caption_1}}Figure 1{{/capture}}{{$Image1|replace:'#Name#':"032808_miller_fig1_N.gif"|replace:'#Width#':"auto"|replace:'#Height#':"auto"|replace:'#Orientation#':"null"|replace:'#CaptionLine#':""|replace:'#/CaptionLine#':""|replace:'#Caption#':$zk_imgcap_caption_1|replace:'#CaptionWidth#':""|replace:'#CaptionHeight#':""|replace:'#CaptionOrientation#':""|replace:'#ImageCredit#':""|replace:'#AltText#':""}} 


Propensity Models


There are many flavors of propensity models in the BI world today, each associated with one or more of the designs in Figure 1. Historically, marketing has equated propensity to predictive models that assess customer probability or likelihood of executing a critical event, such as purchase of goods and services. They speak of propensity (or inclination) for up sell and cross-sell. They trumpet lift, which is actually bang for the predictive buck, the hope being that a relatively small and predictable group of prospects makes the lions’ share of purchases. With significant lift come cheaper modeling, superior predictive accuracy and noticeable marketing ROI.


The marketing propensity model is associated with pre-experimental 1a and 1b, or their methodological cousins that first intervene and then observe multiple times. Most often, the models attempt to gauge the probability of membership in a desirable group, such as purchasers, or perhaps an undesirable group, such as credit card abusers. Logistic regression has long been the preferred modeling technique for marketing propensity analysts. More and more, however, other models and machine learning algorithms that classify membership - such as linear and quadratic discriminant analysis, trees, random forests and boosting algorithms - are being applied to the classification problem. Once accurate predictors are identified and the models refined, prospects can be scored and handled differentially by the business.


The more current work with propensity models offers support for quasi-experimental design 3a. In this case, propensity is concerned, not with final group membership like above, but instead with adjusting for potential biases caused by the absence of randomization to treatment and control groups in evaluations of performance. The hope is that effective statistical adjustment in the analysis can provide much the same benefit after the intervention as randomization does before.


The propensity methodology for design 3a first attempts to predict membership in the treatment or control groups from a series of observed covariates thought to influence the final performance variables of interest.  The intention is to summarize the differences between treatment and control - the potential biases - in a single score that can be used to “cleanse” or adjust comparisons between groups. If the predictions of treatment versus control group membership are well-behaved and have significant overlap, the propensity scores can be used to either match or adjust differences statistically. Such adjustments can, in many cases, provide results as precise as randomized experiments.


Get access to this article and thousands more...

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

  • Full access to including all searchable archived content
  • Exclusive E-Newsletters delivering the latest headlines to your inbox
  • Access to White Papers, Web Seminars, and Blog Discussions
  • Discounts to upcoming conferences & events
  • Uninterrupted access to all sponsored content, and MORE!

Already Registered?


Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
Please note you must now log in with your email address and password.