Free Site RegistrationFree Site Registration

Sign up today and access Information Management on the web!
Your FREE registration entitles you to:

FREE email newsletters

FREE access to all Information Management content

FREE access to web seminars, resource portals, our white paper library and more!

Biostatistics, Open Source and BI – an Interview with Frank Harrell

OpenBI Forum

Information Management Online, February 25, 2009

Steve Miller

I hadn’t corresponded with Frank Harrell in about six months, but had to ping him after his pithy forum response to the article on R in the NY Times. Begrudging the meteoric rise in open source R’s popularity, a VP from proprietary statistical software market leader SAS noted: “I think it addresses a niche market for high-end data analysts that want free, readily available code … We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.” To which Frank deadpanned: “It’s interesting that SAS Institute feels that non-peer-reviewed software with hidden implementations of analytic methods that cannot be reproduced by others should be trusted when building aircraft engines.” Touché.

I’d written about Frank in a previous Information Management article, after meeting him in person and taking his Regression Strategies short course at useR!2007 in Ames, Iowa. Even before that conference, I felt I knew him pretty well. I use his Hmisc and Design R packages all the time, and regularly learn from his informative wiki. And Frank is one of perhaps a dozen or so esteemed R forum participants I religiously follow – regardless of topic. Frank and I are about the same age, but he’s the teacher and I’m the student.

Advertisement

In addition to his status as R elder, Frank has “side” jobs as professor of biostatistics and department chair at Vanderbilt University, having previously served on the faculties of both Duke and Virginia after earning his doctorate at North Carolina. His research revolves on the use of various adaptations of multivariable predictive modeling and attempts to get rigorous biostatistical thinking woven into the fabric of biomedical research. Preventing bad research has also been a common thread to his career. A perusal of Frank’s vitae affirms his status as a leading academic biostatistician.

Frank’s biostatistical wisdom is quite suitable for BI. Evidence-based management (EBM) derives from medicine and an obsession with evaluating the merits of study designs using the evidence hierarchy. He’s a stickler for designs to reduce the bias of analyses so that researchers can confidently conclude not just that A is correlated with B but that A caused B. Frank’s proddings and analytic conservatism push students and researchers to do right statistically. Indeed, I think all modelers in business would be well served with a periodic retreat on predictive modeling strategies with Frank Harrell.

Frank is also a leader in statistical computing. His Hmisc and Design package contributions to the R project provide analysts and statisticians with a wealth of statistical goodies for programming analytics with R. Frank’s roots in computation are very deep, starting with contributions as a young student in the late 1960’s to the then pre-released SAS software platform. His evolution in statistical computing seems a metaphor for the open source movement that’s gaining momentum now.

My correspondence with Frank provided the opportunity to ask him to do an interview for the OpenBI Forum. He graciously accepted, turning around deft responses to my sometimes ponderous questions in very short order. What follows is text for our questions and answer session. I trust that readers will learn as much from Frank’s responses as I did.

 

1. Much of your work focuses on statistical analysis in health care/medical research and the field of epidemiology, which has given us the “evidence hierarchy” of designs for evaluating research. How important is a solid design for “proving” the efficacy of interventions?

It is extremely important, because any non-experimental approach to assessing the efficacy of interventions has to involve getting much more “right” in terms of specifying models. The freedom of not worrying about unmeasured variables in randomized clinical trials can never be forgotten.

2. David Sackett has defined evidence-based medicine (EBM) as "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients ... integrating individual clinical expertise with the best available clinical evidence from systematic research" How would you characterize the current state of EBM?

Much of my work has been indirectly involved with EBM. The current state of EBM is not something we can take a lot of pride in. First, the number of medical, surgical, herbal, and alternative treatments for which true evidence is even sought is frighteningly low. Second, some of EBM is not itself evidence-based. Much EBM involves the use of crude non-patient-specific data in meta-analysis or it involves unwarranted extrapolations. Some national figures such as estimates of the number of unnecessary deaths in hospitals were obtained by studies that were not designed as well as they should have been. In the future we will continue to see EBM progress, but until incentives and regulations are changed, many therapies will not be adequately studied. Let me also add that in many cases an individual excellently designed database can lead to multivariable analysis that provides better answers than a meta-analysis of 20 studies each contributing only crude marginal summaries.

3. In his well-received book Super Crunchers, Yale economist Ian Ayres notes the predictive superiority of analytics over experts in many disciplines, observing that “Unlike self-involved experts, statistical regressions don't have egos or feelings.” Your experience and thoughts pertaining to experts versus analytics in health care?

Page 1 of 3.

Advertisement

Advertisement