Faithful readers of the OpenBI Forum over the last 28 months have probably noticed two common themes to the columns. The first is guidance for business intelligence (BI) from outside business, particularly academic areas such as statistics, operations research, computer science, economics, political science, cognitive science, psychology, etc. The second is an obsession with rigorous methods and designs for BI investigations to help prove the relationships implied by intelligence inquiries. How can a retail company be sure that that its latest marketing campaign resulted in increased sales and profits? How can a telecom company carefully test its new strategy that enhanced employee training will lead to better customer service which, in turn, will promote greater customer loyalty and subsequent profits? How can a financial services company determine which offers will entice prospects to become profitable customers? In each case, BI is testing hypotheses of logical form if x then y, or the more of x, the less of y or x causes y. The tighter the BI design, the more comfort a business can have that the interventions they initiated indeed caused the noted results.
Epidemiology and the Evidence Hierarchy
Epidemiology is the study of health, illness and diseases in populations of humans or animals, seeking answers to questions of how, when and where diseases occur.1,2,3 Though epidemiologists know they can never entirely prove their causal hypotheses, they nevertheless seek to demonstrate associations between risk factors and disease (treatments and outcomes). By using study designs that eliminate competing or confounding explanations, and deploying powerful statistical methods to highlight relative and absolute differences in disease by risk factors, epidemiologists are often able to help solve complicated health problems.
The study designs used by epidemiologists can be classified in a number of ways. Most useful is to contrast randomized experiments, in which individuals are randomly assigned to treatment groups, with observational studies, in which individuals self-select the various groups. Randomized experiments, of course, reasonably assure that factors other than the treatment are, on average, equal between groups. Experiments thus help to minimize potential sources of bias or systematic error. Epidemiologists also contrast retrospective studies, where important events (diseases) have already transpired, with prospective studies, where events may happen in the future. Retrospective studies are generally much less expensive than prospective, but are not as rigorous, and may be hard pressed to eliminate plausible alternative explanations of findings. Finally, epidemiologists distinguish descriptive studies, purporting only to describe the distributions of variables and possibly to make inference about populations, from analytical studies, which look to make causal connections between risk factors (treatments) and outcomes.
With emphases on methods, design and statistical analyses, the study of epidemiology has much to offer BI. What follows are brief descriptions of the major types of epidemiology studies, listed in order of methodological rigor from simple observational case reports to randomized controlled trials, with BI illustrations as appropriate. Designs further down the list are generally preferred. Evidence-based disciplines tout this progression as the hierarchy of evidence, encouraging practitioners to place more emphasis on the results of tighter designs. The business world would be well advised to follow this lead.
The case report is little more than documented anecdotal intelligence, with no real methodological foundation. A sales manager describing characteristics of a successful sales prospect or a marketer detailing results of an initiative without the rigor of a real design and measurement are examples of case reports. Case reports are better than no information at all but certainly not a foundation for insightful BI.
The case series is a step beyond individual case reports, the beginning of a systematic attempt to glean intelligence. In contrast to a single report, the series has the advantage of multiple cases over time, providing a foundation for learning and some preliminary protection from several common biases. This design, however, is quite primitive, unable to withstand serious challenges to investigation validity.
In the small business world, the case series file would be maintained by key decision-makers. Marketing, sales and logistics managers might manage their departments performance information in a spreadsheet or Microsoft Access database, for example. Case series aficionados are willing, if immature, BI analysts.
Cross-sectional investigations depict relationships between treatments or risk factors and outcomes or diseases at a point in time. Cross-sectional analyses are essentially surveys or snapshots, which can provide samples that better represent the population than other designs. At the same time, outcome or disease measures for this study are of limited usefulness. In the parlance of epidemiology, cross-sectional studies can provide information on current diseases or outcomes (prevalence) but cannot identify when the diseases or outcomes first occur (incidence).