March's OpenBI Forum column is the first of a two-part series on validity, design and business intelligence (BI). Part 1 offers a light introduction to the concepts of validity and design, using a sports example as an illustration. From there, we transition to a more formal discussion of the connections between strategy, validity, design and BI. Part 2 expands on the content, adding more sophisticated BI designs and examples.

His'n and Your'n

Many years ago, I read a tribute to Paul "Bear" Bryant, legendary football coach of Alabama, which went something like this: "He'll take his'n and beat your'n, then take your'n and beat his'n." I'm often reminded of this pithy testimonial in sports debates with friends where the issue is one of "proving" which coaches are among the best at their craft. A design like "his'n and your'n" would make most such comparisons easy.

Of course, a pure "his'n and your'n" design would be close to impossible to implement, so let's instead consider several modified versions to evaluate coaching legacy. The first involves multiple coaches with essentially the same team; the second, the same coach with multiple teams. If a coach wins two consecutive world championships with the same franchise, does that assure his greatness? Maybe, maybe not. The Dallas Cowboys of the early 1990s won three Super Bowls in four years under two different coaches and are regarded by many as among the best football teams ever. Jimmy Johnson won two straight. Then, after a year hiatus, Barry Switzer won a championship with substantially the same team that included future Hall of Famers Troy Aikman, Emmitt Smith and Michael Irvin along with a consistent Pro Bowl-caliber offensive line. Is either Jimmy Johnson or Barry Switzer to be anointed as a premier coach for shepherding a team acknowledged as one of the best in history, especially when each had nondescript post-Super Bowl careers? Cowboys' owner Jerry Jones apparently didn't think so, essentially letting Johnson get away after his second Super Bowl victory to demonstrate that it was the organization (i.e., owner), not the coach, who made the difference. For the Cowboys of that era, he may have been right.

Now switch to the career of just-retired Cowboys coach Bill Parcells. The New York Giants hired Parcells in 1983 following nine nonwinning seasons in 10 years. Parcells lifted the team to two Super Bowls victories in eight years before retiring the first time. Following several years as a TV analyst, Parcells returned to coaching with the then-lethargic New England Patriots, quickly leading them to the playoffs within two years and to a Super Bowl a year later. Parcells left New England shortly thereafter, but surfaced as a coach again with the New York Jets, resurrecting a team that had been 4-28 over the previous two seasons to a 9-7 record in his first year and the AFC Championship game in his second. After a subsequent 8-8 season, Parcells retired again, only to come back for a fourth go-round, this time with the Dallas Cowboys, who had suffered through three straight 5-11 seasons. Parcells led the Cowboys to the playoffs with a 10-6 record his first year, then had one bad and one mediocre season before guiding the Cowboys to the playoffs in 2006 and retiring, this time for good. For the Forum's money, Parcell's successes as peripatetic coach offer convincing testimony to his Hall of Fame coaching credentials. He has indeed passed our "his'n and your'n" tests.

Strategy as Hypothesis

Sports may not offer the perfect metaphor for business, but there's plenty of wisdom for BI in "his'n and your'n" thinking. A major focus of BI is on establishing the validity of relationships between factors or variables, much like showing that a certain coach is indeed great. That validity is generally demonstrated by eliminating alternative explanations, so that the ones proposed stand as most credible. And an important tool for demonstrating those assertions is the design that analysts use to test relationships, much like the modified "his'n and your'n" chronology that was offered in support of Bill Parcells' coaching prowess. The validity of BI findings is ultimately proportional to the quality of BI designs: superior designs eliminate competing explanations and promote confidence in proposed relationships.

The January OpenBI Forum discussed the notion of strategy as hypotheses, relating strategic activities to intended outcomes in the form of "if activity A, then outcome B" or "the more of activity A, the better outcome B." A major charter for BI is then to evaluate the performance of strategy, testing whether activity A is associated with outcome B as hypothesized. To successfully accomplish this, BI analysts must obsess with demonstrating the validity of the connection between strategy and outcome, choosing optimal "designs" to eliminate alternative explanations and thus "prove" the linkage.

Validity of Experiments

It's now been 50 years since publication of "Factors Relevant to the Validity of Experiments in Social Settings" by Donald Campbell of Northwestern University.1 This 14-page gem of a paper was foundation reading in college psychology and social science methodology courses for generations of students, introducing the important concepts of research study validity and experimental design. The OpenBI Forum feels the discussion of validity and design noted in this paper is just as important for BI in 2007 as it was for experiments in psychology 50 years ago.

Campbell distinguishes two types of validity that are pertinent for BI. The first, internal validity, asks the question of whether the intervention or strategy made a difference - whether the outcome can be safely inferred from the intervention. Much of the focus in design for internal validity is on controlling extraneous or confounding information so the connection from "if A" to "then B" can be made. Is something other than strategic activity A the cause of B? Are there factors other than the hiring of Bill Parcells that contributed to team turnaround? Are there variables outside the proposed linkage of an increase in employee attitude score to a heightened customer impression that can explain that strategy's success? The second, external validity, is concerned with the representativeness or generalizability of findings to other populations and settings, and will be a subject of a future OpenBI Forum column.

Simple Designs for BI

For the remainder of this article, we discuss the three simple designs for BI noted in Figure 1, using them as points of departure for more sophisticated designs to follow.

Figure 1

The simplest design for BI that serves as a reference for all others is one-shot measurement, where X represents the deployment of strategy and O the observations/measurements, which are taken after the strategy is in place. As an example, the strategy might be the introduction of a new customer service process and supporting computer system (X). The measurement (O) could then be the results of customer surveys, the incidence of customer complaints or recorded customer defection/churn. The company might be interested in knowing if the new process had an impact on customer evaluation of service and if, in turn, that evaluation directed customer behavior. The information associated with the post-process measurement of this design is certainly of some value to the business as it attempts to gather intelligence on its customers. With only one measurement point, however, the design is of limited utility for BI, since no before/after comparisons can be drawn, thus nullifying a test of the impact of the new process.

A second simple design for BI is pre-test/post-test, where O1 is a pre-strategy measurement, X is the strategy and O2 is the post-strategy assessment. For our customer service example, the pre-customer service process measurement (O1) is contrasted to the post-process measurement (O2) to provide a formal comparison of the impact of strategic activity (X). With good fortune, the differences between O2 and O1 can be used to evaluate the effectiveness of the strategy, in our example the new customer service process, to compare the before and after.

There are, however, several potential competing explanations that might be difficult to dismiss with this design. The first and most important is probably history: factors other than the deployment of the new customer service process may be responsible for the difference between post- and pre-measurements. A hypothetical example for this case might be the concurrent deployment of a Six Sigma process that helped improve product quality to such an extent that customer perception increased, while churn/defections decreased. With this example and design, it is impossible to untangle the impact of the new customer service process from the Six Sigma program that was initiated simultaneously. Another plausible explanation for an O2-O1 difference is regression to mean. If the decision to implement the new customer service process was made in part on the basis of low pre measurements (O1), an uplift in scores might simply be a statistical artifact that sees low values move "up" to the mean, and high values slip "down" to the average.

The third and final design considered in this column is the non-randomized group comparison in which measurements are taken on two groups, one that experienced the strategy intervention and one that did not. This design might be in play, for example, if one region of the company had been introduced to the new customer service plan and was contrasted with a region that did not experience the new process. The major threatening alternative explanation for the non-randomized group comparison is that, because there is no random assignment, the groups might be systematically different out of the gate. This "selection" effect, not the strategy itself, might account for the differences in measurements observed. If the New England region is chosen to pilot the new customer service process, and the Southeast is the control group, differences in customer behavior may have more to do with the groups themselves then with exposure to the new process.

"Validity, Design and BI, Part 2" will continue to look at validity and BI, focusing on more sophisticated designs to interpret the results of strategic activities for business intelligence.

Reference:

1. Donald T. Campbell. "Factors Relevant to the Validity of Experiments in Social Settings." Psychological Bulletin. July 1957.