Last month, the Forum introduced the concept of internal validity for evaluating the success of strategy, concluding with three simple and generally inferior designs for BI: one-shot measurement, pre-test/post-test and nonrandomized group comparison (cross- section). The problem with each of these designs is that we are unable to conclude decisively that the program or intervention indeed made a difference - i.e., that the design has internal validity. Alas, weak designs are all too common in business intelligence, even though they needn't be. What follows is a group of more sophisticated designs for BI that can help eliminate confounding explanations to the impact of strategic interventions. We believe the designs outlined can raise the validity of BI findings while at the same time being practical for businesses to deploy.
Practical Designs for BI
Building on last month's article, Figure 1 identifies a number of more advanced designs for BI.
Designs 4a and 4b, where (R) denotes random assignment to treatment/control groups, (X) the strategic activity of interest and (O) the measurements, are our first true experiments. These designs use randomization to make experimental and control groups equivalent on factors other than the strategic intervention. With randomization comes the benefit that differences in measurements between experimental and control groups can more safely be attributed to the strategic intervention than to outside factors. The implications of this are substantial and should put randomized experiments on the very short list of designs to evaluate/improve strategy for BI whenever feasible.
Marketing campaigns, with their capability of randomizing offers to prospects, are generally the gold standard foundation for BI and performance measurement. The Forum recently read an article in the Wall Street Journal noting the success of a mail order retailer who uses inexpensive partial catalogs sent in mail campaigns to push prospects and customers to its Web sites for sales. Optimization of the mailed catalog "chapters" by prospect demographics is determined through responses to campaign experiments using randomization to assure the equivalence of groupings. The responses to the catalog chapters are, in turn, related to subsequent purchase patterns. With large enough sample sizes to minimize the vagaries of chance, this has proven a most cost-effective means of optimizing mail order sales for this retailer.
In contrast to true experimental designs 4a and 4b, which control the "when" and "to whom" of exposure to strategic activities, designs 5 through 7 control the when and to whom of measurement. These quasi-experimental designs are still quite valuable for BI and, indeed, often mirror the conditions of measurement in the business world.
Design 5 is the interrupted time series, in which a sequence of over-time measurements (O) sandwich a strategic intervention (X). This design closely approximates the typical passive BI environment where measurements are periodically made both pre and post strategy, but with no control of exposure to the intervention. Using the illustration from Part 1, the strategic activity would be the introduction of a new customer service process and attending computer systems, while the measurements could include the results of customer satisfaction surveys, the incidence of customer complaints and recorded customer churn.
Though lacking the rigor of a true experiment defined by random assignment, this design is nonetheless an improvement over pre-test/post-test. As with pre-test/post-test, the major threat to the validity of interrupted time series is history, wherein factors other than the deployment of the new customer service process are, in fact, responsible for the over-time difference in measurements. The additional pre and post measurements that come with Design 5 might, however, provide insight that can help eliminate history as a competing explanation. If, for example, measurements vary little before deployment of the new process, then a spike to a higher (or lower) level afterward would be an argument for a positive (or negative) impact of the strategy. On the other hand, if measurements are increasing at a constant rate both pre and post strategy with no interruption, evidence would support something other than the strategic intervention as a cause.
Design 6, multiple time series or panel, takes Design 5 to the next level of detail with the introduction of a "control" time series with many over-time measurements. Indeed, Design 6 can be seen as simultaneously expanding on both pre-test/post-test and nonrandomized group comparison, by adding multiple over-time measurements to each. Like Design 5, this design is easy to deploy in typical BI settings. For example, a multiple time series design might provide a more rigorous test of the new customer service process than the non-randomized group comparison in distinct regions of the country, New England vs. the Southeast, say, as outlined in Part 1. Design 6 gains in validity over non-randomized group comparison by including multiple measuring points pre and post intervention as well as a control comparison group with a similar sequence of observations. This design thus combines the validity-enhancing features of both time series (panel) and comparison group (cross-section) designs. As a step up in design rigor that largely controls for the validity-threatening effects of history and regression to the mean, the deployment-friendly multiple time series (panel) design with ample pre and post measurements should be a staple in the methodology arsenal of every BI analyst.
When first exposed to it many years ago, I had difficulty remembering the workings of regression discontinuity, our Design 7. After a while, I settled on the bromide, "the squeaky wheel gets the grease," and subsequently had no problems. Unlike other designs that assign units to groups either randomly or based on "membership," regression discontinuity assigns to experimental and control through a very clear set of eligibility rules pertaining to a measure of interest. For Design 7, (O) represents measurement and (A) denotes group eligibility. Mirroring the thinking that businesses often use to design new strategic interventions, regression discontinuity can be a very unobtrusive design to measure performance. Financial services companies for example, market investment programs to high net worth (> $1M) customers; airlines initiate loyalty programs for hyper travelers; telecoms make special "offers" to those customers who've triggered an "at risk of churn" event; Medicaid and other government benefits are made available based on income; and acceptance at top universities is determined by seemingly magical qualifications. Participation in such programs is, of course, limited to those who meet the eligibility requirements. Control groups are established in proximity to the experimental group: those with net worth < $1M for the financial services companies or those without a triggering event for the telecom companies or those rejected by the top universities. The comparison of experimental and control groups is generally validated with statistical regression analysis. If the relationship between eligibility score and outcome measure is similar for experimental and control groups, the impact of the intervention can be approximated by the difference between the actual experimental measurements and those predicted from the relationship of eligibility score to outcome. A large difference is indication of a program or strategic intervention effect.
A hybrid of traditional top-down and experimental bottom-up strategy development is now favored by many business thought leaders and is making inroads in organizations of all sizes. In addition, businesses are increasingly using experimental methods to test and evolve their strategies. With these developments, BI becomes more engaged with business to provide analytic support for the strategy build/test/revise cycle. And with this mandate comes a challenge to raise the level of rigor in BI methodology and design so that business can be confident that their strategic linkages are valid.
The designs noted above each offer distinct advantages over simple pre-test/post-test or cross-section comparisons. Randomization to strategy exposure is undoubtedly a gold standard design and is probably more accessible than many think, especially in the internet age. Just ask Google, Yahoo!, Amazon and eBay. Even where randomization is impractical or politically unacceptable, designs like multiple time series panel and regression discontinuity are much more rigorous than simple methods and often can be implemented naturally in the business cycle. The Forum feels that BI analysts should come to obsess with the quality of their designs just as they obsess with quality of data.
OpenBI is currently working with a startup company that seeks to introduce novel products to niches of the health care insurance market. With a solid initial strategic focus, but humility that encourages evidence and learning, this company has adopted an experimental mandate for all operations using its BI platform, insisting that experiments be conducted whenever feasible to both evaluate and evolve its core strategy. Hypothesize, test and evolve is the mantra. The company also feels a pervasive experimental culture will prevent it from becoming complacent in a very competitive environment. OpenBI's role is to help the company operationalize and evaluate its strategy as well as to persistently challenge the organization to improve its processes through experimentation and BI.
The Forum concludes Part 2's discussion by noting in Figure 2 the rotation design, where each group (G) is given randomly, in turn, each strategy (X) over each time period (T), and measurements subsequently taken. In this illustration there are four groups, four strategies and four time periods, with statistical attention focused on the strategy factor. Rotation designs are used most frequently in science and engineering, but serve to close our discussion lightly. If T represents seasons, G football teams and X coaches, we have the perfect design for "his'n and your'n"!
- Donald T. Campbell. "Factors Relevant to the Validity of Experiments in Social Settings." Psychological Bulletin. July, 1957.
- Donald T. Campbell and Julian C. Stanley. Experimental and Quasi-Experimental Designs for Research. Rand McNally. 1963.
- Richard A. Berk and Peter H. Rossi. Thinking About Program Evaluation 2. Sage Publications. 1999.