for Information Management Blogs
MAY 11, 2009 5:42am ET

Blogroll

Bayes and BI

Print
Reprints
Email
I recently completed a series on Bayesian Statistics and BI, with the good fortune of a capstone interview with venerable statistician Brad Efron. The more I get into Bayesian thinking, the more I realize Efron is correct: To be a Bayesian, an analyst must always think like one. The current ascendance of Bayesian analysis in the statistical world is, I believe, a boon for BI.

In its simplest form, Bayes Law can be explained as follows: If E is an event or hypothesis of interest and D is  data or evidence, we are concerned about P(E|D), the probability of  hypothesis E given or conditioned on evidence D. P(E|D)  is calculated as:

P(E)*(P(D|E)/(P(D|E)*P(E) + P(D|~E)*P(~E)), where ~E means not event E . Note the "(" before the first P(D|E).

The holy grail P(E|D) is often called the posterior probability, while P(E) is known as the prior, P(D|E) is the likelihood function, and the ugly right-side denominator is a normalizing factor. So we have the posterior probability = prior probability*likelihood function/normalizing factor. What makes this mumbo-jumbo pertinent is that it provides a powerful way of helping BI realize its charter of facilitating sequential and adaptive organizational learning. We can assess the posterior probability of an important business outcome given a shift in company strategy or operations by establishing the known prior probabilities and wrestling through a likelihood function. The calculated posterior probabilities from step one then become the priors for step two, and the posteriors ~= priors*likelihood cycle repeats, promoting adaptive learning.

I've witnessed a practical illustration of this Bayesian thinking over the past several months. Since early January, I've spent half a dozen Saturdays and Sundays watching the  league matches for my daughter's 15 year old volleyball team. 160 Midwest teams started competition at 3 locales in the Chicagoland area at the beginning of 2009. The teams were seeded prior to play based on last year's performance, coaches' evaluations, random assignment of new clubs, etc. They then went through several rounds of pool play to further determine rankings for initial league competition that started later in the month. Won/Loss record and score differential determined the movement, if any, from initial ratings. Based on the preliminary rankings and the results of the first weekends of play, the teams were divided into 16 progressive brackets of 10 each for inter-bracket competition that will ultimately yield seedings for the national tournaments to be held in June. Teams migrate between ranked brackets during the season based on league performance, their likelihood function. At the end of the season, the initial rankings, the priors, and league/tournament performance, the likelihood, determine the ultimate team rankings after finals – the posteriors. Of course, the whole process starts over in 2010 for 16 year olds, with the 2009 posterior rankings becoming next year’s priors.

Steve Miller's blog can also be found at miller.openbi.com.  

Filed under:

Advertisement

Comments (2)
I would like to see an example of a calculation following the example situation. While the example situation is clear, the use of ~E and ~D in the proposed formula appears to be in error. While ~E appears twice in the formula, I don't see ~D in use at all. This may be a simple typo, but it muddies the concept for me.
Posted by Roy L | Tuesday, May 12 2009 at 12:58PM ET
Thanks for noting the oversight. There were actually 2 problems: a missing right parenthesis in the equation and the unnecessary reference to ~D. Both have been fixed.

Sorry for the confusion.

Posted by steve m | Wednesday, May 13 2009 at 7:36AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Steve Miller

Politics of Data Models and Mining
SAS, WPL Code Competition May Heat Up
SAS vs. R: Statistical Modeling Rivalry Renewed
Machine Learning Hits the Books
Modeling an IT Earnings Disparity

More from Steve Miller »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.