Practical Designs for BI
Building on last month's article, Figure 1 identifies a number of more advanced designs for BI.
Designs 4a and 4b, where (R) denotes random assignment to treatment/control groups, (X) the strategic activity of interest and (O) the measurements, are our first true experiments. These designs use randomization to make experimental and control groups equivalent on factors other than the strategic intervention. With randomization comes the benefit that differences in measurements between experimental and control groups can more safely be attributed to the strategic intervention than to outside factors. The implications of this are substantial and should put randomized experiments on the very short list of designs to evaluate/improve strategy for BI whenever feasible.
Marketing campaigns, with their capability of randomizing offers to prospects, are generally the gold standard foundation for BI and performance measurement. The Forum recently read an article in the Wall Street Journal noting the success of a mail order retailer who uses inexpensive partial catalogs sent in mail campaigns to push prospects and customers to its Web sites for sales. Optimization of the mailed catalog "chapters" by prospect demographics is determined through responses to campaign experiments using randomization to assure the equivalence of groupings. The responses to the catalog chapters are, in turn, related to subsequent purchase patterns. With large enough sample sizes to minimize the vagaries of chance, this has proven a most cost-effective means of optimizing mail order sales for this retailer.
In contrast to true experimental designs 4a and 4b, which control the "when" and "to whom" of exposure to strategic activities, designs 5 through 7 control the when and to whom of measurement. These quasi-experimental designs are still quite valuable for BI and, indeed, often mirror the conditions of measurement in the business world.
Design 5 is the interrupted time series, in which a sequence of over-time measurements (O) sandwich a strategic intervention (X). This design closely approximates the typical passive BI environment where measurements are periodically made both pre and post strategy, but with no control of exposure to the intervention. Using the illustration from Part 1, the strategic activity would be the introduction of a new customer service process and attending computer systems, while the measurements could include the results of customer satisfaction surveys, the incidence of customer complaints and recorded customer churn.
Though lacking the rigor of a true experiment defined by random assignment, this design is nonetheless an improvement over pre-test/post-test. As with pre-test/post-test, the major threat to the validity of interrupted time series is history, wherein factors other than the deployment of the new customer service process are, in fact, responsible for the over-time difference in measurements. The additional pre and post measurements that come with Design 5 might, however, provide insight that can help eliminate history as a competing explanation. If, for example, measurements vary little before deployment of the new process, then a spike to a higher (or lower) level afterward would be an argument for a positive (or negative) impact of the strategy. On the other hand, if measurements are increasing at a constant rate both pre and post strategy with no interruption, evidence would support something other than the strategic intervention as a cause.
Design 6, multiple time series or panel, takes Design 5 to the next level of detail with the introduction of a "control" time series with many over-time measurements. Indeed, Design 6 can be seen as simultaneously expanding on both pre-test/post-test and nonrandomized group comparison, by adding multiple over-time measurements to each. Like Design 5, this design is easy to deploy in typical BI settings. For example, a multiple time series design might provide a more rigorous test of the new customer service process than the non-randomized group comparison in distinct regions of the country, New England vs. the Southeast, say, as outlined in Part 1. Design 6 gains in validity over non-randomized group comparison by including multiple measuring points pre and post intervention as well as a control comparison group with a similar sequence of observations. This design thus combines the validity-enhancing features of both time series (panel) and comparison group (cross-section) designs. As a step up in design rigor that largely controls for the validity-threatening effects of history and regression to the mean, the deployment-friendly multiple time series (panel) design with ample pre and post measurements should be a staple in the methodology arsenal of every BI analyst.
When first exposed to it many years ago, I had difficulty remembering the workings of regression discontinuity, our Design 7. After a while, I settled on the bromide, "the squeaky wheel gets the grease," and subsequently had no problems. Unlike other designs that assign units to groups either randomly or based on "membership," regression discontinuity assigns to experimental and control through a very clear set of eligibility rules pertaining to a measure of interest. For Design 7, (O) represents measurement and (A) denotes group eligibility. Mirroring the thinking that businesses often use to design new strategic interventions, regression discontinuity can be a very unobtrusive design to measure performance. Financial services companies for example, market investment programs to high net worth (> $1M) customers; airlines initiate loyalty programs for hyper travelers; telecoms make special "offers" to those customers who've triggered an "at risk of churn" event; Medicaid and other government benefits are made available based on income; and acceptance at top universities is determined by seemingly magical qualifications. Participation in such programs is, of course, limited to those who meet the eligibility requirements. Control groups are established in proximity to the experimental group: those with net worth < $1M for the financial services companies or those without a triggering event for the telecom companies or those rejected by the top universities. The comparison of experimental and control groups is generally validated with statistical regression analysis. If the relationship between eligibility score and outcome measure is similar for experimental and control groups, the impact of the intervention can be approximated by the difference between the actual experimental measurements and those predicted from the relationship of eligibility score to outcome. A large difference is indication of a program or strategic intervention effect.