for Information Management Blogs
SEP 14, 2009 5:00am ET

Blogroll

Dear Steve, Dear Eric, Dear Steve – Musings on Predictive Analytics World

Print
Reprints
Email
I received a very nice note from Eric Siegel, PhD., Conference Chair, Predictive Analytics World last week in response to my recent blog that mentioned an upcoming PAW. Eric's points were important, and since that article was already a few weeks old when he wrote me, I felt it'd be better to post his thoughts and my responses in a new blog, rather than go through a Comments cycle for a posting whose time had already past. Below is the text of an email exchange between Eric and me that IM readers will hopefully find of value.

Dear Steve,

Thanks for your valuable endorsement of Predictive Analytics World.  Great points you've made here.

Indeed, we are seeing more advanced predictive modeling methods (including ensemble methods like bagging and boosting as you mentioned) in commercial deployment; in fact, there were a few sessions on that in February, plus an advanced workshop that included coverage -- and there's more coming in October's PAW program.

Regarding more advanced methodology, we've upped the ante with demarcated "expert/practitioner level" sessions (look for the little red triangles in the program).

5 of the 25 sessions in October 20-21's event in DC will be repeat speakers from February's conference -- we've brought back the best for the east coasters. And expect a new line-up for PAW San Francisco 2010 (Feb. 16-17).

I do wish to be cautious putting the focus on more sophisticated core analytics, with business value the objective and some conference attendees new to the field. Newcomers to predictive analytics should know it is not always a better core modeling method that makes the bigger difference when deploying predictive analytics for your business -- it is the holistic process, including strategically selecting what operational decisions to automate/support, which customer action to predict, and how to prepare the learning data and prepare it well. As you increase the complexity of the core method, the returns usually diminish more quickly than investing in increasing the quantity and quality of the data, and, in fact, you loose a sometimes-important benefit: the "transparency" or "understandability" of a simpler model; you can look at it and, by understanding its logic or (in some cases) business rules, you often gain strategic insights, and attain a means to "sell" the model's credibility in order to secure buy-in for its use/deployment.

So there's a balance to be struck, and much to be gained from the general sessions (blue circles), even for the most advanced analytical expert.

Best Regards,
Eric Siegel, Ph.D.
Conference Chair
Predictive Analytics World

Dear Eric,

Thanks for the thoughtful comments.


When I first reviewed the agenda for the October PAW, I noticed your talk,  Dean Abbott's talk, John Elder's talk and Usama Fayyad's talk, all of which looked suspiciously like the excellent ones I attended in San Francisco last February. I guess I recognized the repeat speakers immediately. I'm sure the new presentations will be updated for repeat attendees.

I agree wholeheartedly with your emphasis on a holistic predictive modeling process that obsesses on business and data issues. Over 50% of the revenue for my company, OpenBI, comes from the business and data sides, even as we promote analytics.  Our motto is  Analytics for show, data integration for dough.

I assume that by “more sophisticated core analytics”, you're referring to the “statistical learning shrinkage methods” mentioned in my blog. If that's the case, I'm not sure I agree that they're more complex than the traditional multiple/logistic regression models I saw at PAW. I had my formal stats training over 30 years ago, and find the new stuff invigorating. If one were to start from a clean statistical slate, I'd say the  learning methods might even make more sense, especially if the regression techniques are to be applied with the state-of-the-craft sophistication of Frank Harrell's Regression Modeling Strategies. One area where traditional methods prevail is in formal hypotheses testing, but I'd bet that many marketers would opt for predictive accuracy first anyway. If you haven't seen the enclosed article by the late statistician Leo Breiman, originator of CART and Random Forests, you might find it interesting. It contrasts traditional statistical models with algorithmic models – essentially multiple/logistic regression versus statistical learning. And I think many of these more sophisticated models are actually simpler to deploy, demanding less of analysts than statistical regression done right.
I realize the quandary you have touting open source projects like R when SAS and SPSS are conference sponsors. As an open source company, we see that in all areas of BI. I befriended a SAS developer at a statistical learning seminar in Boston last year who lamented SAS's lack of passion to get newer methods into Enterprise Miner, acknowledging EM looked old. I'm sure you're aware that R is now lingua franca of academic statistical computing. It'll continue to gain traction as new generations of students become addicted, much like SAS of 30 years ago.

One topic I didn't mention in my blogs is Bayesian predictive models. The bayesian/frequentist wars that stunted development when I was in school appear to be in the past now. Bayesian learning is certainly close to the business model of analytics, so it'd sure seem like a good idea to have the Bayesian framework represented at PAW. Columbia professor Andrew Gelman would be a great choice. And how about Trevor Hastie of Stanford for a state of predictive analytics keynote?

Enough of my blather. I'm going to try to make it to DC. If I do, I'll look you up.

Best Regards,
Steve Miller
President, OpenBI, LLC

Dear Steve,

Thanks for your thoughts. Funny you should mention this article by Leo Breiman because, honest-to-goodness, it's printed, literally sitting on the very top of my reading pile right next to me.

I should mention that PAW is indeed a steadfast supporter of R, hosting free-to-the-public R "useR meetings" (no conference registration required) in the evening at last February's event, the upcoming Oct 20-21 conference in DC, as well as February 2010's event back in San Francisco -- and also including regular conference sessions illustrating the deployed use of R.

The university professors you mentioned would be great speakers at PAW, although for keynote headliners we generally stick to our mandated focus on commercial deployment rather than technical research and development. Of course, some university faculty are directly involved in deployment, despite the "publish-or-perish" pressure of their every-day lives; if so, bring them on!

Best,
Eric
Filed under:

Advertisement

Comments (1)
Steve, Nice story. I've noticed that in many of your articles you mention R, but I was curious as to your thoughts regarding WEKA and the community surrounding that product. Thanks, Peter
Posted by Peter S | Monday, September 21 2009 at 1:39PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Steve Miller

Politics of Data Models and Mining
SAS, WPL Code Competition May Heat Up
SAS vs. R: Statistical Modeling Rivalry Renewed
Machine Learning Hits the Books
Modeling an IT Earnings Disparity

More from Steve Miller »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.