I recently completed a series on Bayesian Statistics and BI, with the good fortune of a capstone interview with venerable statistician Brad Efron. The more I get into Bayesian thinking, the more I realize Efron is correct: To be a Bayesian, an analyst must always think like one. The current ascendance of Bayesian analysis in the statistical world is, I believe, a boon for BI.
In its simplest form, Bayes Law can be explained as follows: If E is an event or hypothesis of interest and D is data or evidence, we are concerned about P(E|D), the probability of hypothesis E given or conditioned on evidence D. P(E|D) is calculated as:
P(E)*(P(D|E)/(P(D|E)*P(E) + P(D|~E)*P(~E)), where ~E means not event E . Note the "(" before the first P(D|E).
The holy grail P(E|D) is often called the posterior probability, while P(E) is known as the prior, P(D|E) is the likelihood function, and the ugly right-side denominator is a normalizing factor. So we have the posterior probability = prior probability*likelihood function/normalizing factor. What makes this mumbo-jumbo pertinent is that it provides a powerful way of helping BI realize its charter of facilitating sequential and adaptive organizational learning. We can assess the posterior probability of an important business outcome given a shift in company strategy or operations by establishing the known prior probabilities and wrestling through a likelihood function. The calculated posterior probabilities from step one then become the priors for step two, and the posteriors ~= priors*likelihood cycle repeats, promoting adaptive learning.
I've witnessed a practical illustration of this Bayesian thinking over the past several months. Since early January, I've spent half a dozen Saturdays and Sundays watching the league matches for my daughter's 15 year old volleyball team. 160 Midwest teams started competition at 3 locales in the Chicagoland area at the beginning of 2009. The teams were seeded prior to play based on last year's performance, coaches' evaluations, random assignment of new clubs, etc. They then went through several rounds of pool play to further determine rankings for initial league competition that started later in the month. Won/Loss record and score differential determined the movement, if any, from initial ratings. Based on the preliminary rankings and the results of the first weekends of play, the teams were divided into 16 progressive brackets of 10 each for inter-bracket competition that will ultimately yield seedings for the national tournaments to be held in June. Teams migrate between ranked brackets during the season based on league performance, their likelihood function. At the end of the season, the initial rankings, the priors, and league/tournament performance, the likelihood, determine the ultimate team rankings after finals the posteriors. Of course, the whole process starts over in 2010 for 16 year olds, with the 2009 posterior rankings becoming next years priors.
Steve Miller's blog can also be found at miller.openbi.com.