I've assembled my BI-Searchers toolkit. It consists of an open source technology stack that includes the MySQL database for storage, either Pentaho's Kettle or one of the Python/Ruby agile languages with database connectivity for ETL/programming, one of the Pentaho or Jaspersoft dialects of Mondrian OLAP server for slice and dice/drilling, the other Mondrian for for interactive statistical visualization of large data sets, and the R Project for Statistical Computing for advanced graphics and predictive analytics. Tying together the data delivered from all this wonderful technology are designs that provide context for analytic interpretation.
Of course, heading the list of most desirable designs are experiments that assign participants randomly to intervention and control groups. The platinum standard for design, randomized experiments provide the benefit of equalizing potentially confounding factors outside intervention and control so that, within statistical limits, analysts can be reasonably sure differences in between-group outcomes are due to the intervention itself and not to uncontrolled variables. Experiments are especially suitable for Web commerce, where companies learn from visitors and customers with randomized interactions, and marketing campaigns that randomly assign prospects/customers to different flavors of offers.
Alas, randomized experiments aren't always suitable for BI. In some cases, there's just a single group to measure; in others, it's impractical to make random assignments to intervention and control. Designs that look like real experiments aside from randomization to treatment and control are called quasi-experiments and include many of the traditional designs for BI. The challenge for analysts with quasi-experiments is to convincingly demonstrate internal validity, which addresses whether the intervention under investigation actually caused changes in measurements over time. Without the help of randomization, proving internal validity is a challenge but certainly not impossible.
BI analysts must pay special attention to a number of threats to the validity of their investigations. Among the most pernicious are: 1) History/Maturation, wherein differences in measurement before and after treatment are due not to the intervention itself, but instead to history and growth that are unrelated to treatment; 2) Selection, wherein for multiple group designs, the groups are different out of the gate, and it's that difference rather than the intervention that shows up in the measurements; 3) Regression to the Mean, wherein treatment and control are allocated on pre-measurement extremes. Subsequent measurement will generally look less extreme, even in the absence of an intervention impact. 4) Attrition, wherein subjects leave the investigation before it concludes, potentially biasing the findings.
Interrupted Time Series (ITS) is a class of designs long used in the social sciences and education that's especially pertinent for BI. Indeed, most BI-savvy companies are currently using some variant of ITS even if they don't recognize the name. The ITS design consists of one or more observations or measurements (O), followed by one or more interventions (X), followed in turn by additional observations (O). The simplest ITS is the common pre-post design, characterized as #1 in the table below. ITS designs can grow in complexity with more pre and post observations, more treatments, and more comparison groups. Cognizant of threats to validity and armed with the components of the ITS, analysts can build flexible but powerful BI designs. Particularly well-formulated ITS designs, can, in fact, approach the power of randomized experiments in combating alternative explanations of analysis findings, essentially proving (or disproving) that the intervention(s) caused the changes in measurements.
You own a retail chain consisting of 24 stores, 16 in Massachusetts and 8 in Mississippi. The recession and increases in wholesale prices are taxing your margins, so much so that you find it necessary to raise prices on core items, but are concerned about customer reaction. A strong believer in testing and intelligence, you decide to experiment with price elasticity before committing to company-wide change. You ask your clever BI staff to come up with a series of designs to test the impact of price changes, measuring pricing, volume, revenue and profit. Firm believers in ITS, they come back with the following for management discussion:
Though a nice starting point, you reject #1 as too simplistic. With just single pre and post observations, it's too difficult to interpret change, to separate price responsiveness from history or maturation. And if the testing commences in a down market, there's always the possibility that business improves because of an uplifting economy that performance regresses to the mean independent of pricing changes. #2 is a better design, the multiple pre and post measures providing some protection against alternative explanations. #3 is better yet, especially if noted changes disappear when price changes are rescinded.
#4 introduces multiple group and is generally considered a desirable quasi-experiment. The challenge with this design is one of selection. The groups might represent stores in Massachusetts and Mississippi that serve fundamentally different populations. #5 promotes different flavors of intervention perhaps multiple pricing variations. #6 is a design that might best align with business needs, testing and learning in sequence. Finally, #7 is a personal favorite, combining multiple interventions with several groups, including a control. In our example, X1 and X2 might be two competing pricing models, with the first group a Massachusetts store, the second from Mississippi, and the control a store from each state. The multiple groups with control, in addition to multiple, sequenced interventions, should be able to handle many of the alternative explanations related to history, maturation, selection and regression to the mean.
I've presented just a sampling of the possibilities. Clever BI analysts can explore even more sophisticated designs that map to the climate of their enterprise. Two excellent sources on designs in general and Interrupted Time Series in particular are the comprehensive text by Shadish, Cook and Campbell, and the clearly-written introductory article by Glass.