Faithful readers of the OpenBI Forum over the last 28 months have probably noticed two common themes to the columns. The first is guidance for business intelligence (BI) from outside business, particularly academic areas such as statistics, operations research, computer science, economics, political science, cognitive science, psychology, etc. The second is an obsession with rigorous methods and designs for BI investigations to help prove the relationships implied by intelligence inquiries. How can a retail company be sure that that its latest marketing campaign resulted in increased sales and profits? How can a telecom company carefully test its new strategy that enhanced employee training will lead to better customer service which, in turn, will promote greater customer loyalty and subsequent profits? How can a financial services company determine which offers will entice prospects to become profitable customers? In each case, BI is testing hypotheses of logical form if x then y, or the more of x, the less of y or x causes y. The tighter the BI design, the more comfort a business can have that the interventions they initiated indeed caused the noted results.
Epidemiology and the Evidence Hierarchy
Epidemiology is the study of health, illness and diseases in populations of humans or animals, seeking answers to questions of how, when and where diseases occur.1,2,3 Though epidemiologists know they can never entirely prove their causal hypotheses, they nevertheless seek to demonstrate associations between risk factors and disease (treatments and outcomes). By using study designs that eliminate competing or confounding explanations, and deploying powerful statistical methods to highlight relative and absolute differences in disease by risk factors, epidemiologists are often able to help solve complicated health problems.
The study designs used by epidemiologists can be classified in a number of ways. Most useful is to contrast randomized experiments, in which individuals are randomly assigned to treatment groups, with observational studies, in which individuals self-select the various groups. Randomized experiments, of course, reasonably assure that factors other than the treatment are, on average, equal between groups. Experiments thus help to minimize potential sources of bias or systematic error. Epidemiologists also contrast retrospective studies, where important events (diseases) have already transpired, with prospective studies, where events may happen in the future. Retrospective studies are generally much less expensive than prospective, but are not as rigorous, and may be hard pressed to eliminate plausible alternative explanations of findings. Finally, epidemiologists distinguish descriptive studies, purporting only to describe the distributions of variables and possibly to make inference about populations, from analytical studies, which look to make causal connections between risk factors (treatments) and outcomes.
With emphases on methods, design and statistical analyses, the study of epidemiology has much to offer BI. What follows are brief descriptions of the major types of epidemiology studies, listed in order of methodological rigor from simple observational case reports to randomized controlled trials, with BI illustrations as appropriate. Designs further down the list are generally preferred. Evidence-based disciplines tout this progression as the hierarchy of evidence, encouraging practitioners to place more emphasis on the results of tighter designs. The business world would be well advised to follow this lead.
The case report is little more than documented anecdotal intelligence, with no real methodological foundation. A sales manager describing characteristics of a successful sales prospect or a marketer detailing results of an initiative without the rigor of a real design and measurement are examples of case reports. Case reports are better than no information at all but certainly not a foundation for insightful BI.
The case series is a step beyond individual case reports, the beginning of a systematic attempt to glean intelligence. In contrast to a single report, the series has the advantage of multiple cases over time, providing a foundation for learning and some preliminary protection from several common biases. This design, however, is quite primitive, unable to withstand serious challenges to investigation validity.
In the small business world, the case series file would be maintained by key decision-makers. Marketing, sales and logistics managers might manage their departments performance information in a spreadsheet or Microsoft Access database, for example. Case series aficionados are willing, if immature, BI analysts.
Cross-sectional investigations depict relationships between treatments or risk factors and outcomes or diseases at a point in time. Cross-sectional analyses are essentially surveys or snapshots, which can provide samples that better represent the population than other designs. At the same time, outcome or disease measures for this study are of limited usefulness. In the parlance of epidemiology, cross-sectional studies can provide information on current diseases or outcomes (prevalence) but cannot identify when the diseases or outcomes first occur (incidence).
Cross-sectional surveys are generally quick, inexpensive and useful for generating hypotheses. Voter exit polls are examples of cross sectional analyses, as are company surveys of existing customers. Such surveys might provide substantial insight about the current customer base, with statistical models detailing associations among survey variables. Merely snapshots in time. however, they cannot shed light on the customer lifecycle, nor can they contrast successful with unsuccessful marketing efforts.
Ecological epidemiology studies are observational investigations based on secondary, aggregated data used to identify disease prevalence and risk factors for different population groups. The challenge for such analyses is making the jump from group to individual-level inference. In fact, its often a mistake to draw conclusions about individuals from analytics on groups. This ecological fallacy describes precisely the problems analysts often have with this design.
For business, the use of census data to assist in strategy development is an illustration of the potential of ecological analyses. As long as analysts acknowledge the limitations of group-level analysis, ecological data can be an important source of inspiration for BI.
Case control investigations are retrospective analyses wherein units exposed to a risk, intervention or other groupings are analyzed after the fact and contrasted with groups lacking such exposure. The proportions of disease or success are then compared among the different groups, with inferences made accordingly.
Case control studies are quite pervasive in the business guru world. The focus of the underlying research is generally a look at characteristics of successful companies over a period of time. Some of this work attempts to contrast great companies with those that are simply good or even failures, providing benchmarks for comparison. The perceived commonalities of what makes the companies great in comparison to the not-so-great are then trumpeted in best-seller books, seminars and consulting engagements.
Case control is not a very powerful design, unable to refute any number of challenges to its validity. Indeed, its often the case that gleaning results from these connecting the winning dots analyses regress to the mean, with both winners and losers approaching outcome mediocrity over time. The lasting success predicted for such great companies is delusional. Perhaps the focus of case control design in BI should be in generating early hypotheses that are subsequently tested with the two more rigorous methods noted below.
The cohort is the most serious observational design, combining the strengths of over-time (longitudinal) measurement with one or more control groups to contrast with treatment, thus eliminating many sources of bias. Because cohort analysis is prospective the design is superior to case control, but still suffers from observational investigation weaknesses, such as selection bias.
The cohort design, representing the peak of observational analyses in the evidence hierarchy, should be the minimal standard to which BI aspires. For typical performance management of strategic interventions, companies should contrast treatment groups with controls, taking many measurements over time. An example might be a restaurant chain piloting a new menu in select cities, contrasting performance over time with a group of carefully selected controls.
Randomized Controlled Trial
The randomized controlled trial should be the platinum standard design to which BI aspires in the assessment of performance. With this prospective, analytical, experimental study, individuals are randomly assigned to one of the treatment or control groups. Potential confounding factors are thus equal across groups, minimizing bias and facilitating statistical comparison of outcomes. In single-blind trials, participants are unaware of the treatment/control status. In double-blind trials, both participants and experimenters are unaware of group assignment. Blindness can help minimize observer and reporting biases.
Many Internet companies use randomized experiments to help optimize their strategies. Visitors to Google, Yahoo and Amazon participate in more randomized trials than they know. Harrahs uses the experimental method to discover profitable new segments of its customer base. And Capital One routinely experiments with prospects on fees, interest rates and reward factors to optimize its customer mix, often with surprising results. Sitting at the summit of the evidence hierarchy, companies who routinely use randomized experiments to develop and test their strategies can enjoy confidence in their findings.
- David G. Kleinbaum, Kevin M. Sullivan and Nancy D. Barker. A Pocket Guide to Epidemiology. Springer: 2007.
- Mona Okasha. Epidemiological Research. Student BMJ Web page, 2001. http://student.bmj.com/issues/01/08/education/277.php
- John Gay. Clinical Epidemiology and Evidence-Based Medicine Glossary: Clinical Study Design and Methods Terminology. Washington State University College of Veterinary Medicine, 1999. http://www.vetmed.wsu.edu/courses-jmgay/GlossClinStudy.htm .