I finally got around to cleaning up my home office the other day. The biggest challenge was putting away all the loose books in such a way that I can quickly retrieve them when needed.
In the clutter I found two copies of “The Elements of Statistical Learning” by Trevor Hastie, Robert Tibshirani and Jerome Friedman – one I purchased two years ago and the other I received at a recent Statistical Learning and Data Mining (SLDM III) seminar taught by first two authors. ESL is quite popular in the predictive modeling world, often referred to by aficionados as “the book”, “the SL book” or the “big yellow book” in reverence to its status as the SL bible.
Hastie, Tibshirani and Friedman are Professors of Statistics at Stanford University, the top-rated stats department in the country. For over 20 years, the three have been leaders in the field of statistical learning and prediction that sits between traditional statistical modeling and data mining algorithms from computer science. I was introduced to their work when I took the SLDM course three years ago.
H&T have been teaching variants of SLDM to eager modeling students for over 15 years. That I've taken the course twice since 2008 speaks to my assessment of its value. I've already decided to sign up again when SLDM III becomes SLDM IV, and enthusiastically recommend the seminar to BI practitioners with a moderate statistical background looking for exposure to the latest predictive methods.
Trevor Hastie will be the first to acknowledge the good fortune in his career as a statistician. His time as a PhD student at Stanford coincided with origination of the "bootstrap" methodology by eminent professor Brad Efron that helped usher in computational and simulation techniques to statistical science in the early 80s. And one of his first jobs after completing his dissertation was with the statistics and data analysis research group at AT&T Bell Labs in its heyday.
There he was the beneficiary of collaboration with John Chambers, primary architect of the S statistical computing platform. S is the predecessor of open source R that now reigns as data analysis software of choice in the academic and research communities – and more and more in commerce as well. Hastie joined the Stanford Statistics Department in 1994, serving as Chairman from 2006-09.
Hastie's current research revolves on statistical problems in biology and genomics, medicine and industry. His specialties include data mining, prediction, classification and statistical computing. He’s written two books on applied statistical models: "Generalized Additive Models" (with R. Tibshirani, Chapman and Hall, 1991), and "Elements of Statistical Learning" (with R. Tibshirani and J. Friedman, Springer 2001; second edition 2009). Hastie's also co-edited a book with John Chambers that introduced a large software library of modeling tools in the S language ("Statistical Models in S", Wadsworth, 1992). That library's now the basis for much of statistical modeling in R. Finally, he's the author and maintainer of several prominent R packages.
It's safe to say Trevor Hastie's playing an important role in shaping the evolution of modern applied statistics from it's staid theoretical and mathematical past to an exciting present that emphasizes computation and predictive solutions to real world problems involving big data. That's a good thing for BI.
I asked Trevor in Boston if he'd be willing to do an interview for Information Management and he graciously agreed. Below are his thoughtful answers to my questions.
1) Where does statistical learning fit with traditional statistics and data mining?
For me “statistical learning” is a more evocative phrase than “applied statistical modeling”, which is a term from statistics, but they essentially mean the same, with the former perhaps emphasizing prediction. The phrase data mining comes from computer science and engineering, where we imagine large troves of data (typically gathered in some automatic fashion), and we are searching for structure, and perhaps predictive targets.
Many ingenious techniques have been invented in this domain (e.g. boosting, neural networks), often with an algorithmic flavor. Statistical learning for me tries to understand these new techniques in terms of statistical models, and connect them with other more traditional approaches.
2) The statistical learning/data mining (SLDM) public short course you co-teach with Rob Tibshirani has been very successful. Can you briefly chronicle its history?
Rob and I taught our first course at the Social Security Administration in 1995. It was a custom-made four-day in-house course, and it was pretty tiring. After the course was over, we had to wait a few months to get paid, because the government put a freeze on all payments till the budget was resolved.
We did a few more in-house courses after that, but at some point we decided we preferred the two-day open format and that we could administer the courses ourselves. My wife Lynda has administered these courses from home ever since. We have had four versions of the class, the first was called “Modern Regression and Classification”, and then after 2000 we changed the name to “Statistical Learning and Data Mining”.
We are now on SLDMIII. We teach the class twice a year, once at Stanford in Spring, and then on the East coast or in Europe in the Fall. We really enjoy the wide variety of interests of the attendees, coming from industry, government, academia, medicine and finance. We get about 65 -70 people each time.
3) How has the practice of statistics changed during your career? Mathematical to computational? Big data? Many more variables than cases? Enhanced visualization?
I think the move from mathematical to computational pretty much started when I was at graduate school. Brad Efron introduced the bootstrap in the early 80s, and I was at Stanford then. The idea that you could replace gnarly derivations of distributions by simulation was a huge breath of fresh air. I was surrounded by computational visionaries such as Friedman, Stuetzle, Buja and John Tukey. This continued at Bell labs. Simulation and randomization is fundamental in statistics today.
4) You've been closely engaged with the evolution of statistical computing, first with S at Bell Labs and now with R at Stanford. Could you give us some insights on the path you've experienced? R is now the platform of choice for academic statistics. Thoughts on R? Do you envision R becoming dominant in the commercial world as well?
At Stanford I programmed in Fortran - Rob and I wrote GAM in Fortran. Then I went to Bell Labs in 1986 and learned to program in S. S was quite slow at first. I remember we wrote glm and gam in S, and I worried that it was too slow. John Chambers reassured me that by the time it was “out there”, computers would be fast enough; he was right.
We had a lot of fun developing the statistical modeling software and formula language in S. This was made commercial as Splus. I was at a Splus conference in Wellington, NZ in 1991, and these two Roberts gave a talk on R, the “free” version of S for Macs. I remember at the time thinking this was ambitious, but unlikely to catch on. How wrong!
I changed to R in about 2003, and have never looked back. R has become extremely useful and very powerful, and is backed by a high-quality team of “R-core” experts (I am not one). I have a number of R packages, most recently glmnet, that I maintain. One of the beauties of R is the package system, which has become really high quality. Before a package is accepted, it has to go through a stringent series of tests.
5) Do you agree with Hal Varian that the “sexy job in the next ten years will be statistician ... The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s going to be a hugely important skill in the next decades.”?
Isn’t that great. When I was at school, we were the nerds. Now we are sexy! Wish I could be back at school. Clearly data analysis and modeling has become fundamental in many areas of science, technology and business. As our abilities to gather and store data improve, so does the need increase to make sense of it.
6) Are your Stanford PhD. students often now choosing commerce instead of academia? How about Masters and Bachelors students? Do you envision any changes to Statistics curricula to accommodate the needs of big, messy, internet data?
There has been a move to commerce, which we do not all welcome. We train top-quality statisticians, and an increasing fraction go to Wall street to make their fortune. I would personally prefer to see them harness their knowledge to move science forward. Stanford has traditionally also supplied top faculty for the many good universities around the country, and I would not like to see that decline.
7) The late Berkeley statistician Leo Breiman, originator of CART and Random Forests, in 2001 chided academic statistics for its obsession with“data models” that have led to “irrelevant theory and questionable conclusions that have kept statisticians from working on a large range of interesting current problems.” Algorithmic modeling, in contrast, “can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets.” Do/did you agree with Breiman's position?
Not really, and I don't think he really believed this either. Leo was frustrated with the timidity of statisticians, and was impressed by the boldness of machine learners at the Nips and Snobird meetings. We have spent a lot of our time understanding some of these “algorithmic” approaches in terms of statistical models, and we believe this has borne fruit. In fact Leo did the same. Random forests might seem algorithmic, but Leo himself explained their mechanism in terms of variance reduction of a noisy model with low bias.
8) The Bayesian paradigm, which seems to reflect how organizations learn, has returned from statistical exile. Your thoughts?
Another consequence of the computational explosion. Almost all Bayesian computation these days is via MCMC and Gibbs sampling. I think it’s great. I am not a Bayesian, and think strict Bayesianism is too limiting. I do however like empirical Bayes methods; many of our models, such as the lasso or smoothing, fit comfortably in this framework. Bayesians do not look at the data enough for my liking; they tend to rely too heavily on their models.
9) “Extrapolate” out 10-20 years. What changes do you envision to the teaching and conduct of statistics? If you were a 22 year old new college grad today, would you again choose a statistics career? An academic statistics career?
Statistics will have to come to grips with massive data, and we will have to teach that.
But the field has been pretty good at adapting to the data of the day. I feel learning the fundamentals is still very important. That’s how we know how to evaluate the tools we create and use. Statistics has become a very hot field. Students at Stanford in Engineering and CS are opting to get an additional MS in Statistics to make themselves more employable. Academic career? I actually liked my path, which was 8 years in industry, and then the academic career.