APR 20, 2009 3:07am ET

Web Seminars

Advancing Analytics in the Digital Age
July 11, 2014
The Real Cost of “Chat Now” for Your Business
July 29, 2014

More Statistical Learning

Print
Reprints
Email

The Elements of Statistical Learning : Data Mining, Inference and Prediction. Second Edition, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman is now available. The authors, along with mentor Brad Efron and other faculty/students from the top-ranked Statistics department at Stanford University, continue to progress the discipline of statistical learning – a convergence of statistics with machine learning – at a feverish pace, much to the benefit of business intelligence.

ESL is encyclopedic in its command of the latest methods, especially those surrounding supervised learning, in which there is a known dependent variable to predict, either numeric (e.g. lifetime customer value) or classification (e.g. fraud abuse). Alas, ESL's a book for the mathematically sophisticated, the reading probably a bit heavy for many BI analysts. 

Those without a strong math background but with a solid business understanding of multiple regression might benefit first from Richard Berk's Statistical Learning from a Regression Perspective. Berk, a statistician and social scientist, provides a more gentle, applied foundation for statistical learning that should resonate with a BI audience.

Berk's outlines four separate “stories” for which traditional parametric regression or the more flexible statistical learning techniques are applicable:
  • A Causal Story – in which the analyst is looking to test a theory – to relate the independent X's to the dependent Y in such a way as to conclude the X's cause Y.
  • A Conditional Distribution Story – in which the analyst deploys the traditional linear regression model with assumptions about the behavior of the errors.
  • A Data Summary Story – in which the learning is used to reduce to dimensionality of the problem space.
  • A Forecasting Story – in which the analyst constructs a model to forecast future behavior.
Though I'm not sure Berk would agree, I see the first two stories as more relevant to scientific research and academia, the latter more pertinent for business analytics. I've also found that for summary and forecasting challenges, the newer learning techniques often perform better than traditional regression, especially when the relationship of X's to Y is complex. 

Much to my satisfaction, Berk provides a comprehensive discussion of regression smoothers. Though smoothers look much like linear regression models, they differ in that they adapt to the patterns in the data more readily than traditional regression, where the analyst must specify the functional form of the relationship in advance. As Berk notes: “As long as one is content to merely describe, these methods are consistent with the goals of exploratory data analysis.”

Berk also pays homage to Classification and Regression Trees (CART), a foundational learning method that's served the BI world well for over 15 years. The next generation of CART-like methods builds on the wisdom of crowds to ensemble ever more precise predictions. Bagging deploys bootstrapping methods to resample and average multiple predictions, often with significant forecasting lift. Boosting, by contrast, aggregates a group of weak classifiers – perhaps each little better than random guessing – into a committee with often powerful predictive insights.

SLRP is an excellent text that can serve analysts well as either a prerequisite or co-reading for ESL. Let me suggest a third book to be written for a statistical learning trilogy. This text would take the methods so elegantly formulated in ESL and explained in SLRP to provide comprehensive illustrations with real business and social science data sets. The analysis could be done with R packages hot off SourceForge, written by the method developers themselves. With patterned prediction problems and ample R code, the newest learning techniques would get a major “boost” in the BI world.

Steve Miller's blog can also be found at miller.openbi.com.

Get access to this article and thousands more...

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

  • Full access to information-management.com including all searchable archived content
  • Exclusive E-Newsletters delivering the latest headlines to your inbox
  • Access to White Papers, Web Seminars, and Blog Discussions
  • Discounts to upcoming conferences & events
  • Uninterrupted access to all sponsored content, and MORE!

Already Registered?

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.