ESL is encyclopedic in its command of the latest methods, especially those surrounding supervised learning, in which there is a known dependent variable to predict, either numeric (e.g. lifetime customer value) or classification (e.g. fraud abuse). Alas, ESL's a book for the mathematically sophisticated, the reading probably a bit heavy for many BI analysts.
Those without a strong math background but with a solid business understanding of multiple regression might benefit first from Richard Berk's Statistical Learning from a Regression Perspective
. Berk, a statistician and social scientist, provides a more gentle, applied foundation for statistical learning that should resonate with a BI audience.
Berk's outlines four separate stories for which traditional parametric regression or the more flexible statistical learning techniques are applicable:
- A Causal Story in which the analyst is looking to test a theory to relate the independent X's to the dependent Y in such a way as to conclude the X's cause Y.
- A Conditional Distribution Story in which the analyst deploys the traditional linear regression model with assumptions about the behavior of the errors.
- A Data Summary Story in which the learning is used to reduce to dimensionality of the problem space.
- A Forecasting Story in which the analyst constructs a model to forecast future behavior.
Though I'm not sure Berk would agree, I see the first two stories as more relevant to scientific research and academia, the latter more pertinent for business analytics. I've also found that for summary and forecasting challenges, the newer learning techniques often perform better than traditional regression, especially when the relationship of X's to Y is complex.
Much to my satisfaction, Berk provides a comprehensive discussion of regression smoothers. Though smoothers look much like linear regression models, they differ in that they adapt to the patterns in the data more readily than traditional regression, where the analyst must specify the functional form of the relationship in advance. As Berk notes: As long as one is content to merely describe, these methods are consistent with the goals of exploratory data analysis.
Berk also pays homage to Classification and Regression Trees (CART), a foundational learning method that's served the BI world well for over 15 years. The next generation of CART-like methods builds on the wisdom of crowds to ensemble ever more precise predictions. Bagging deploys bootstrapping methods to resample and average multiple predictions, often with significant forecasting lift. Boosting, by contrast, aggregates a group of weak classifiers perhaps each little better than random guessing into a committee with often powerful predictive insights.
SLRP is an excellent text that can serve analysts well as either a prerequisite or co-reading for ESL. Let me suggest a third book to be written for a statistical learning trilogy. This text would take the methods so elegantly formulated in ESL and explained in SLRP to provide comprehensive illustrations with real business and social science data sets. The analysis could be done with R packages hot off SourceForge, written by the method developers themselves. With patterned prediction problems and ample R code, the newest learning techniques would get a major boost in the BI world.