My son’s taking a business statistics course in college and recently asked me about this Bayes’s rule stuff that’s complicating his life.
Bayes’s rule has to do with the conditional probability of an event B given the occurrence of another event A. In mathematical terms, this conditional probability, P(B|A), is equal to P(B)*P(A|B)/P(A). Students in introductory statistics courses are often given exercises whose solutions require interpretation/application of this Bayesian algebra. The correct results are often counterintuitive.
Bayes’s rule becomes more interesting – and controversial – when applied to learning associated with scientific or research questions. There, events A and B become H (hypotheses) and E (evidence), respectively, so P(H|E) = P(H)*P(E|H)/P(E), or, in Bayesian words, the posterior probability = prior probability*likelihood /normalizing constant. In plainer English, this “diachronic” P(H|E) version of Bayes’s rule shows how learning occurs – how posterior probabilities are modified by new information.
According to The Economist: “The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. In other words, it allows scientists to combine new data with their existing knowledge or expertise.” Nobel Laureate, Daniel Kahneman, author of “Thinking Fast and Slow,” only wishes humans were effective Bayesian thinkers. His experiments show how we often mishandle “priors” or base rates in our calculation of posteriors.
It hasn’t been a smooth ride for Bayesian analyses in the 250 years since the theorem was first published. For much of the 1900s especially, Bayesian statistics was rejected as too subjective by mainstream “frequentists” – progenitors of the statistical paradigm dominant today. Indeed, Bayes’s rule was essentially non grata when I was in grad school 30 years ago.
What’s the big disagreement? “The Bayesian and frequentist approaches differ fundamentally in their characterizations of probability. Frequentists see probability as the objectively measured, relative frequency of an outcome over a large number of trials. Bayesians, in contrast, view probability as a more subjective concept tied to an individual’s judgment of the likelihood of an outcome. For frequentists, the uncertainty surrounding probability is in the events; for Bayesians, the uncertainty has to do with interpretation by observers.”
Today, much to the benefit of business learning, Bayesian statistics is enjoying a re-vitalization in the statistical, science and research worlds, its continuous learning model well-suited to many analytics problems.
Alas, there’s a significant divide between the basic Bayesian concepts and real-world Bayesian methods: understanding the algebra and psychology of Bayes’s rule is one thing; implementing rigorous Bayesian models is quite another. Transition from the concepts to topics such as Markov Chain Monte Carlo (MCMC) methods, hierarchical models, and Bayesian inference Using Gibbs Sampling (BUGS), is anything but straightforward. Bayesian analysis seems to go from simple to complex with no intermediate steps.
Computer scientist and statistician Allen Downey, author of the brief but excellent O’Reilly book “Think Stats,” and a wonderful complementary two and a half hour YouTube lecture Bayesian statistics made (as) simple (as possible), to the rescue. Both the book and video provide “an introduction to Bayesian statistics using Python.” Students who work through the material will learn a good deal about both statistics and programming.
I love this approach. “The thesis of this book is that if you know how to program, you can use that skill to help you understand probability and statistics. These topics are often presented from a mathematical perspective, and that approach works well for some people. But some important ideas in this area are hard to work with mathematically and relatively easy to approach computationally.” More the “computationist” than mathematician, this idiom works especially well for me.
The video especially provides the intermediate detail for Bayesian computation that I’ve had trouble finding over the years, working through examples a step at a time to demonstrate how p(H|E) is continually updated given the accumulating evidence. And Downey deploys Python code to showcase the concepts. Attentive students thus see the entire process from beginning algebra to final computation. Good stuff.
After the Think Stats read, the Bayesian YouTube video, and the O’Reilly, Python for Data Analysis Webcast, I have renewed excitement for Python as a platform for statistics and data analysis. I’ll have more to say on the topic in coming blogs.