A few years ago, after undergoing a routine physical, a friend of mine received some bad news. A test given to men his age came back positive, an indication of a pretty serious medical condition.  Worse, 99 percent of those with the disease test positive, while only 5 percent who don't have the condition get the bad news.

At first he was near panic, but started to calm down after we did some research on the Web. When we took what we'd found and used Bayes theorem to calculate the probability he actually had the disease given the positive test, he felt a lot better.

Indeed, with our research indicating that the overall incidence of the disease is 1/1000 or 0.1 percent, we computed the conditional probability that my friend had the condition given the positive test to be under 2 percent. He later received the good news from his physician that he was one of the fortunate disease-free false positives.

The Bayes theorem that came to my friend's rescue is about conditional probability – the likelihood that a certain event occurs given other pertinent information. It was first published in 1763 by Thomas Bayes, a then recently deceased British Presbyterian minister. Bayes, a Fellow of the Royal Society of London, was undistinguished as a mathematician in his lifetime. Yet his “An Essay Towards Solving a Problem in the Doctrine of Chances” came to be held in the highest regard in the field of probability theory.

The essence of Bayes theorem is as follows. If E is an event of interest, then P(E) is the probability of E, while P(~E) is the probability of not E. Similarly, if D represents the conditional data or information, P(D) is the probability of that information. All probabilities are, of course, by definition between 0 and 1.  For applications of Bayes Law, the interest is in P(E|D) – the probability of event E given or conditioned on information D. For my friend, P(E|D) represents the probability that he has the disease given a positive test result.

Bayes Law states that:
P(E|D) = P(E) * P(D|E)/P(D), or, substituting for P(D) and using probability algebra:
= P(E)*P(D|E)/(P(D|E)*P(E) + P(D|~E)*P(~E))

In our example, this reduces to: .001*.99/(.99*.001 + .05*.999) = .019

In Bayes jargon, P(E|D) is known as the Posterior Probability (Probability of the Event Given the Information), P(E) is called the Prior Probability (Previously Known Probability of the Information), and P(D|E)/(P(D|E)*P(E) + P(D|~E)*P(~E)) is the Likelihood Ratio (Learning Factor).

So, the Posterior Probability = Prior Probability*Likelihood Ratio, or, in words:

The relative belief in a hypothesis given some data is the support the data provides for the hypothesis times the prior belief in the hypothesis divided by the support times the prior for all hypotheses.

## Bayesians vs. Frequentists

The paradigm of framing statistical problems in terms of posterior probabilities, prior probabilities, and likelihood ratios is known as the Bayesian method. Bayesian thinking was dominant in the 19th century statistical world, but the 20th century saw the emergence of the classical or frequentist paradigm as a replacement. Indeed, the frequentist flavor is what’s most often taught in university statistics curricula today.

Bayesian and frequentist approaches differ fundamentally in their characterizations of probability. Frequentists see probability as the objectively-measured, relative frequency of an outcome over a large number of trials. Bayesians, in contrast, view probability as a more subjective concept tied to an individual’s judgment of the likelihood of an outcome. For frequentists, the uncertainty surrounding probability is in the events; for Bayesians, the uncertainty has to do with interpretation by observers.

This difference in the meaning of probability created a significant divide between frequentists and Bayesians. Frequentists reject the reliance on subjective prior probabilities that drive Bayesian analysis, arguing that such probabilities cannot be measured reliably. In fact, they label subjective priors as the Achilles heal of Bayesian thinking, noting that the results of  computations would change markedly with different prior “guesstimates.” Historically, frequentists have also cited the complexity of calculations as an inhibitor to Bayesian adoption.

Bayesians, on the other hand, argue that their approach is quite natural for real world problems. They generally deploy more data than frequentists, in turn taking on complex problems with a vigor that can lead to stronger conclusions. Bayesians cite the power of inversion – solving for P(E|D) by estimating priors and the more tractable likelihood P(D|E). They also tout their close kinship with decision-making, arguing their methods transparently incorporate judgments for optimal decisions. What frequentists cite as a weakness of the Bayesian method is promoted as a strength by Bayesians. From my vantage point, the frequentist approach seems to align more closely with the scientific method, while Bayesian thinking might be better suited for day-to-day decision-making and organizational learning.

“Back in the day” of my statistical studies, frequentists ruled and Bayesians were pretty much non grata. But for a host of reasons, the statistical tide is turning with the mainstream starting to embrace Bayes.

My second blog in this three-part discussion will discuss this transformation, noting why Bayesian thinking is becoming more and more pertinent for business.