Continue in 2 seconds

The Bayesian Debate

Published
  • October 02 2003, 1:00am EDT

Many readers of this column will never have heard of Bayes’ Theorem or Bayesian inference. The latter has sparked years of debate among statistical researchers and is evidently making a comeback in how widely it is used. This debate presents several issues that are not only interesting but relate to highly pragmatic concerns that many people should find useful.

Thomas Bayes was an 18th century minister from England who developed one of the basic principles of probability. It is a simple mathematical formula (that I will forego writing down) that shows how the probability of a random event occurring is modified when partial information relevant to the event is obtained and considered. If I am in a casino playing roulette, there are 38 equally probable slots into which to ball can fall (1 through 36, 0 and 00). If I have a bet on number 10, my probability of winning is 1/38th (about 2.6 percent). Now suppose that the ball falls and initially I can’t see the number, but I can see that it has landed in a red slot. (Even numbers are red, odd numbers are black, 0 and 00 are green.) Now there are only 18 slots into which the ball could have fallen. Since 10 is a red slot and is still in the running, my probability of winning is now 1/18 (about 5.6 percent). Knowing that the ball landed in a red slot has modified my probability of winning, more than doubling the chance for success.

Statistical inference is a way of quantifying knowledge about unknown quantities using observed data. (The non-Bayesian approach is sometimes call the "frequentist" approach, a terminology I will adopt.) Before explaining how the Bayesian and frequentist approaches differ, let me be clear about some aspects of the approaches that are not different.

  • Both approaches accept and use Bayes’ Theorem as critical components of the analysis.
  • Both approaches use models with unknown parameters to characterize the real world. For example, the probability of purchasing a luxury car might be modeled as:
    Pr(Luxury Car) = a + b*Income

    where "a" and "b" are unknown parameters.

  • Both approaches need to collect data observations as a basis for estimating the values of the unknown parameters.

The basic difference in the approaches is in how they treat the unknown parameters.
The frequentist takes the unknown parameters as fixed values. While the researcher does not know the value of the parameters, some "true" values are taken to exist. Analysis proceeds by using statistics to determine the probability of observing the actual data under alternative values of the parameters. Inferences about the parameters are based upon which alternative values of the parameters best make the expected observations match the actual observations.

For example, suppose I observe a roulette table for 100 spins and observe 52 occurrences of "red." If the table is "fair" (and the probability of red is about 47 percent) then the chances of observing 52 or more "reds" is about 20 percent. Having something happen every 1 out of 5 times is not particularly unlikely, so I would not question the fairness of the table.

A Bayesian would agree that there is some true (but unknown) value of the parameters. However, rather than analyzing the probability of observing the data, they would treat the parameters as random and would analyze the probability distribution of the parameters themselves. The probability distribution of the parameters represents that researcher’s "beliefs" or knowledge about the actual values of the parameters.

For the Bayesian, analysis proceeds by using the data to create a more narrow, probable distribution for the parameters. This is where Bayes’ Theorem plays a central role. Just as we used the theorem earlier to increase the probability that a roulette spin comes up on number 10, the theorem is applied to the parameter distribution to narrow down the range of likely parameter values.

Recall, however, that Bayes’ Theorem can only be used to modify an existing (or prior) probability distribution. Similarly, in order to use Bayesian analysis the researcher must specify a "prior" distribution for the parameters that captures the researcher’s "going in" beliefs about the likely values for the parameters.

This is where frequentists really start to object. Researchers, they say, should not be biasing the outcome of the analysis by bringing in their personal prior beliefs. To do so seems arbitrary and unscientific.

The Bayesian counter argument, as I understand it, is twofold.

First they would argue that yes, at times the researcher should bring prior beliefs into the analysis – if those beliefs are strongly held.

Second, in many Bayesian analyses the outcome is not really influenced by the specified prior distribution. Rather, the distribution specified is weighted very evenly across possible values of the parameters, to reflect the "going in" uncertainty about these values. When a large amount of data is collected, the influence of the data on the analysis swamps any small influence of the prior.

If we accept that Bayesian analysis doesn’t have fatal methodological flaws, does it have any advantages over the more widely used frequentist approach? Possibly – while the importance of these points depends on the particular problem, there can be some significant advantages to Bayesian analysis.

  • As already mentioned, it incorporates the impact of prior beliefs when appropriate.
  • Analyzing the parameters as if they follow a probability distribution can provide more intuitive and easy to understand results. (For example, "There is a 95 percent probability that the parameter lies between 1.3 and 1.7.") It also makes it easy to incorporate the results into the decision-making process in a way that recognizes the uncertainty of the parameter estimates.
  • Recent advances in methodology have made estimation of some models much easier using Bayesian analysis.

In truth, this last reason is the only reason I developed an interest in Bayesian analysis. These techniques allow the estimation of certain complicated models in days, when it would have taken weeks under the frequentist approach if it was possible at all.
My personal conclusions are as follows. First, neither the frequentist nor Bayesian approaches are "wrong." Second, the frequentist approach is what most researchers (at least in the U.S.) are used to. For most applications, this will by my approach. But, third, when I have a pragmatic reason to use Bayesian analysis I will not hesitate to do so.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access