I received a very nice note from Eric Siegel, PhD., Conference Chair, Predictive Analytics World last week in response to my recent blog that mentioned an upcoming PAW. Eric's points were important, and since that article was already a few weeks old when he wrote me, I felt it'd be better to post his thoughts and my responses in a new blog, rather than go through a Comments cycle for a posting whose time had already past. Below is the text of an email exchange between Eric and me that IM readers will hopefully find of value.
Thanks for your valuable endorsement of Predictive Analytics World. Great points you've made here.
Indeed, we are seeing more advanced predictive modeling methods (including ensemble methods like bagging and boosting as you mentioned) in commercial deployment; in fact, there were a few sessions on that in February, plus an advanced workshop that included coverage -- and there's more coming in October's PAW program.
Regarding more advanced methodology, we've upped the ante with demarcated "expert/practitioner level" sessions (look for the little red triangles in the program).
5 of the 25 sessions in October 20-21's event in DC will be repeat speakers from February's conference -- we've brought back the best for the east coasters. And expect a new line-up for PAW San Francisco 2010 (Feb. 16-17).
I do wish to be cautious putting the focus on more sophisticated core analytics, with business value the objective and some conference attendees new to the field. Newcomers to predictive analytics should know it is not always a better core modeling method that makes the bigger difference when deploying predictive analytics for your business -- it is the holistic process, including strategically selecting what operational decisions to automate/support, which customer action to predict, and how to prepare the learning data and prepare it well. As you increase the complexity of the core method, the returns usually diminish more quickly than investing in increasing the quantity and quality of the data, and, in fact, you loose a sometimes-important benefit: the "transparency" or "understandability" of a simpler model; you can look at it and, by understanding its logic or (in some cases) business rules, you often gain strategic insights, and attain a means to "sell" the model's credibility in order to secure buy-in for its use/deployment.
So there's a balance to be struck, and much to be gained from the general sessions (blue circles), even for the most advanced analytical expert.
Best Regards, Eric Siegel, Ph.D. Conference Chair Predictive Analytics World Dear Eric,
Thanks for the thoughtful comments.
Did you see these earlier IM blogs on last winter's PAW? http://www.information-management.com/news/10015015-1.html, http://www.information-management.com/blogs/10015018-1.html?
When I first reviewed the agenda for the October PAW, I noticed your talk, Dean Abbott's talk, John Elder's talk and Usama Fayyad's talk, all of which looked suspiciously like the excellent ones I attended in San Francisco last February. I guess I recognized the repeat speakers immediately. I'm sure the new presentations will be updated for repeat attendees.
I agree wholeheartedly with your emphasis on a holistic predictive modeling process that obsesses on business and data issues. Over 50% of the revenue for my company, OpenBI, comes from the business and data sides, even as we promote analytics. Our motto is Analytics for show, data integration for dough.
I assume that by more sophisticated core analytics, you're referring to the statistical learning shrinkage methods mentioned in my blog. If that's the case, I'm not sure I agree that they're more complex than the traditional multiple/logistic regression models I saw at PAW. I had my formal stats training over 30 years ago, and find the new stuff invigorating. If one were to start from a clean statistical slate, I'd say the learning methods might even make more sense, especially if the regression techniques are to be applied with the state-of-the-craft sophistication of Frank Harrell's Regression Modeling Strategies. One area where traditional methods prevail is in formal hypotheses testing, but I'd bet that many marketers would opt for predictive accuracy first anyway. If you haven't seen the enclosed article by the late statistician Leo Breiman, originator of CART and Random Forests, you might find it interesting. It contrasts traditional statistical models with algorithmic models essentially multiple/logistic regression versus statistical learning. And I think many of these more sophisticated models are actually simpler to deploy, demanding less of analysts than statistical regression done right. I realize the quandary you have touting open source projects like R when SAS and SPSS are conference sponsors. As an open source company, we see that in all areas of BI. I befriended a SAS developer at a statistical learning seminar in Boston last year who lamented SAS's lack of passion to get newer methods into Enterprise Miner, acknowledging EM looked old. I'm sure you're aware that R is now lingua franca of academic statistical computing. It'll continue to gain traction as new generations of students become addicted, much like SAS of 30 years ago.
One topic I didn't mention in my blogs is Bayesian predictive models. The bayesian/frequentist wars that stunted development when I was in school appear to be in the past now. Bayesian learning is certainly close to the business model of analytics, so it'd sure seem like a good idea to have the Bayesian framework represented at PAW. Columbia professor Andrew Gelman would be a great choice. And how about Trevor Hastie of Stanford for a state of predictive analytics keynote?
Enough of my blather. I'm going to try to make it to DC. If I do, I'll look you up.
Best Regards, Steve Miller President, OpenBI, LLC
Thanks for your thoughts. Funny you should mention this article by Leo Breiman because, honest-to-goodness, it's printed, literally sitting on the very top of my reading pile right next to me.
I should mention that PAW is indeed a steadfast supporter of R, hosting free-to-the-public R "useR meetings" (no conference registration required) in the evening at last February's event, the upcoming Oct 20-21 conference in DC, as well as February 2010's event back in San Francisco -- and also including regular conference sessions illustrating the deployed use of R.
The university professors you mentioned would be great speakers at PAW, although for keynote headliners we generally stick to our mandated focus on commercial deployment rather than technical research and development. Of course, some university faculty are directly involved in deployment, despite the "publish-or-perish" pressure of their every-day lives; if so, bring them on!