I was surfing the Social Science Statistics Blog
I wrote about in my 5/18/2008 column
when I came across an intriguing entry from April 4, 2009, titled: Can Nonrandomized Experiments Yield Accurate Answers?
As I read first the blog summary, then the article itself, I realized the issues raised by the authors are pertinent to business and BI.
BI is often tasked with measuring the performance of a new business intervention such as the opening of a new store, the launch of a product, execution of a marketing campaign, or a change in business strategy. Two of the more common measurement designs for testing such interventions are the simple pretest-postest and pretest-postest with control group. The schematics below depict each design, O representing measurement, X signifying the intervention. The pretest-postest design assesses the group that received the intervention both before and after. The pretest-postest with control adds a second group that is measured but does not receive the treatment:
pretest-postest: O X O
pretest-postest with control: O X O
The trouble with these simple nonrandomized designs is that it's easy to construct alternative explanations to the intervention itself for differences in pre and post measurements. Pity the poor dupe hired with a two year contract as CEO of a cyclical company in January 2008. Even with heroic effort, her company performance will pale in a pre-post comparison to 2006-2007, victim of our deep recession. Alas, through no fault of her own, she could be unemployed at the end of a second recessionary year. The new coach of the Detroit Lions, on the other hand, cannot have a worse record than his 2008 predecessor, who suffered through an ignominious 0-16 season. Regression to the mean will assure the Lions at least a few victories in 2009, even if the team still stinks. In both cases though, something other than the "intervention" could be producing the change in performance, despite consequences for those responsible.
The second design, pretest-postest with control, is a step up from simple pretest-postest. Differences in pre and post test measures due to factors other than the internention, such as history or maturation, should be similar between treatment and control. Analysts often construct the pre-post differences for intervention and control, and then compute the difference of these measures the so-called "difference in differences". While often very valuable, this design can suffer from "selection bias" if the treatment and control groups are not comparable, perhaps differing systematically out of the gate on factors other than the intervention which could influence measurement. A pilot product test constrained to New England, for example, might succeed (or fail) more on the nature of New Englanders than the product itself.
The true experiment, with random assignment to intervention and control groups, is an obvious improvement over simple pretest-posttest with nonrandomized control. In theory, random assignment assures (within probability limits) that other "confounding" factors that could contaminate results are "equal" between the groups. In a properly-randomized experiment, the difference in differences calculation can be telling. Amazon's randomized experiments with user experience on their web site, and Capital One's experiments with different credit card offerings, varying fees, interest rates and rewards, are cases in point.
There are many BI situations, however, where randomization is impractical or inadvisable. The BI analyst is then left with the challenge of responding to alternative explanations to the intervention that could explain changes in measurement. Often, in these instances, the analyst will measure potential confounding variables, and adjust findings statistically using techniques such as matching, analysis of covariance, and propensity scores.
The Can Nonrandomized Experiments Yield Accurate Answers? article details an investigation contrasting findings of bias from a randomized experiment to a self-selected study. The results, heartening for BI, are that statistical adjustments of nonrandomized experiments can approximate findings from randomized inquiries. For situations in which randomization is infeasible, therefore, BI analysts would be wise to consider the adjustment techniques detailed in this article to "purify" their findings, so they can enjoy the statistical confidence of a randomized experiment.