No Quick Death for Statistical Practices
I’m still reeling from the provocative but important new book, “Big Data: A Revolution That Will Transform How Live, Work and Think,” co-authored by Oxford professor Viktor Mayer-Schonberger and Economist editor Kenneth Cukier, that I twice blogged on a few weeks back.
As I noted then: “I must admit my traditional statistical grounding has taken a hit with Big Data. The notions that the core scientific method techniques of sampling, measurement error, and the experimental method’s cause and effect, may well lose importance as central components of the analytics’ tool chest hasn’t quite registered with me yet– and maybe never will.” Could it be that statistical practice as we know it is on its death bed, an artifact of technology and computation limitations of an earlier age, destined to be supplanted by new N=all, messy data and correlation-only methodologies?
My anxiety was at least partially assuaged when I received an advance copy of Tom Davenport and Jinho Kim’s moderating “Keeping Up with the Quants: Your Guide to Understanding + Using Analytics.” I’m a big fan of Davenport, author of award-winning books “Competing on Analytics” and “Analytics at Work,” as well as countless articles on analytics in the Harvard Business Review, the MIT Sloan Management Review and Information Management. Few in our business have the analytics Rolodex of Davenport.
I like Quants’ definition of analytics as “the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and add value.” If you’re looking for a how-to-do-it tech guide, though, Quants isn’t it. Instead, it’s designed for business analyst/managers who strive to be “better consumers of data … and more conversant in analytics,” as well as to work effectively with quants.
The authors articulate an analytics methodology comprised of three stages and six steps. The framing the problem stage includes problem recognition and review of previous findings. The solving the problem stage includes variable selection and modeling, data collection and data analysis. Finally, communicating and acting on results presents findings and suggests actions. The chapters of the book detail stages and steps in turn, citing countless academic and business illustrations of both analytics successes and failures.
Quants is a breezy, two-sitting read. Its analytics paradigm reads a lot like those from research methods books used in undergraduate social science courses – very much the hypothesis and explanation-driven, experimental search for cause and effect-obsessed, traditional inferential statistics-focused science.
How do I reconcile this approach with the make-no-apologies, N=all of Mayer-Schonberger/Cukier? Perhaps some wisdom from eminent Stanford statistician Brad Efron is helpful. In an interview four years ago, Efron noted “statistics has enjoyed modest, positively-sloped growth since 1900. There is now much more statistical work being done in the scientific disciplines, what with biometrics/biostatistics, econometrics, psychometrics, etc. – and business as well. Statistics is now even entrenched in hard sciences like physics. There are also the computer science/artificial intelligence contributions of machine learning and other data mining techniques. If data analysis were political, biometrics/econometrics/psychometrics would be ‘right wing’ conservatives, traditional statistics would be ‘centrist’ and machine learning would be ‘left-leaning.’ The conservative-liberal scale reflects how orthodox the disciplines are with respect to inference, ranging from very to not at all.”
In Efron parlance, Quants would probably represent conservative, orthodox methodology while Big Data would be liberal – even socialist. The pure conservatives espouse a top-down, hypotheses-driven “planning” approach tied to the experimental method and fueled by open/interpretable inferential statistics. Explanation is as important as prediction. The extreme liberals, in contrast, are bottom-up, data-driven “searchers” deploying big data and closed/black-box algorithms. Explanation be dammed; the best predictions win.
Me? I’d like to think I’m more of a centrist, recognizing the need to adopt techniques from all parties. In fact, I believe it’s pretty important for analytics professionals, quants, data scientists – whatever the flavor – to be conversant with the entire political analytics spectrum, as each brings unique contributions to the discipline.
What do readers think?