Id written about Frank in a previous Information Management article, after meeting him in person and taking his Regression Strategies short course at useR!2007 in Ames, Iowa. Even before that conference, I felt I knew him pretty well. I use his Hmisc and Design R packages all the time, and regularly learn from his informative wiki. And Frank is one of perhaps a dozen or so esteemed R forum participants I religiously follow regardless of topic. Frank and I are about the same age, but hes the teacher and Im the student.
In addition to his status as R elder, Frank has side jobs as professor of biostatistics and department chair at Vanderbilt University, having previously served on the faculties of both Duke and Virginia after earning his doctorate at North Carolina. His research revolves on the use of various adaptations of multivariable predictive modeling and attempts to get rigorous biostatistical thinking woven into the fabric of biomedical research. Preventing bad research has also been a common thread to his career. A perusal of Franks vitae affirms his status as a leading academic biostatistician.
Franks biostatistical wisdom is quite suitable for BI. Evidence-based management (EBM) derives from medicine and an obsession with evaluating the merits of study designs using the evidence hierarchy. Hes a stickler for designs to reduce the bias of analyses so that researchers can confidently conclude not just that A is correlated with B but that A caused B. Franks proddings and analytic conservatism push students and researchers to do right statistically. Indeed, I think all modelers in business would be well served with a periodic retreat on predictive modeling strategies with Frank Harrell.
Frank is also a leader in statistical computing. His Hmisc and Design package contributions to the R project provide analysts and statisticians with a wealth of statistical goodies for programming analytics with R. Franks roots in computation are very deep, starting with contributions as a young student in the late 1960s to the then pre-released SAS software platform. His evolution in statistical computing seems a metaphor for the open source movement thats gaining momentum now.
My correspondence with Frank provided the opportunity to ask him to do an interview for the OpenBI Forum. He graciously accepted, turning around deft responses to my sometimes ponderous questions in very short order. What follows is text for our questions and answer session. I trust that readers will learn as much from Franks responses as I did.
1. Much of your work focuses on statistical analysis in health care/medical research and the field of epidemiology, which has given us the evidence hierarchy of designs for evaluating research. How important is a solid design for proving the efficacy of interventions?
It is extremely important, because any non-experimental approach to assessing the efficacy of interventions has to involve getting much more right in terms of specifying models. The freedom of not worrying about unmeasured variables in randomized clinical trials can never be forgotten.
2. David Sackett has defined evidence-based medicine (EBM) as "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients ... integrating individual clinical expertise with the best available clinical evidence from systematic research" How would you characterize the current state of EBM?
Much of my work has been indirectly involved with EBM. The current state of EBM is not something we can take a lot of pride in. First, the number of medical, surgical, herbal, and alternative treatments for which true evidence is even sought is frighteningly low. Second, some of EBM is not itself evidence-based. Much EBM involves the use of crude non-patient-specific data in meta-analysis or it involves unwarranted extrapolations. Some national figures such as estimates of the number of unnecessary deaths in hospitals were obtained by studies that were not designed as well as they should have been. In the future we will continue to see EBM progress, but until incentives and regulations are changed, many therapies will not be adequately studied. Let me also add that in many cases an individual excellently designed database can lead to multivariable analysis that provides better answers than a meta-analysis of 20 studies each contributing only crude marginal summaries.
3. In his well-received book Super Crunchers, Yale economist Ian Ayres notes the predictive superiority of analytics over experts in many disciplines, observing that Unlike self-involved experts, statistical regressions don't have egos or feelings. Your experience and thoughts pertaining to experts versus analytics in health care?