Though many analytics practitioners are little aware of its importance, stochastic simulation has increasingly become a critical tool in the modern statistical toolchest. Perhaps you know of SS techniques as Monte Carlo simulation instead. While there are subtle differences between the two, the terms are often used interchangeably.

The words stochastic and simulation together are needed to describe this method of problem-solving. Simulation denotes a "representation of the operation or features of one process or system through the use of another.” A simulation is thus a model used to produce outputs that in turn facilitates learning about a process of interest.

Stochastic denotes “involving chance or probability,” or “involving a random variable,” and introduces the notions of probability, chance and experimentation to the simulation. Indeed stochastic simulations are experiments driven by randomly-generated inputs to specified probability distributions. Run a stochastic simulation twice and you'll get two different, though hopefully similar, results. That imprecision alone is anathema to many mathematical purists. Those who engage in stochastic simulation must be willing to accept these “approximate analytics.”

Now much more than a novelty, stochastic simulation is central to the practice of statistics and mathematical optimization. The intractable mathematics that in the past hindered deployment of promising methods can now many times be finessed by powerful SS computational techniques. And the SS methods are, of course, fundamentally enabled by modern computers and statistical software.

Most stochastic simulation methods involve a sequence of steps that lead to the estimated results. First is the articulation of the input domain. Second is the generation of inputs drawn randomly from a specified probability distribution over that domain. Third is the computation of an output from the selected inputs. And finally, the process is repeated many times, with outputs aggregated.

The archetype stochastic simulation estimates “pi” using a little geometry and a lot of computer power. Consider a circle inscribed within a square. If the radius of the circle is r, then it's area is pi*r**2. The area of the enclosing square is 2r*2r = 4r**2, and the ratio of the areas of the circle to the square is thus pi/4. If we randomly and uniformly generate points within the square, the proportion of points that also falls within the circle multiplied by 4 should approximate pi. Using R on my notebook, a simulation of 10M such points estimates pi at 3.1412. Contrast that with the 4-digit constant value of 3.1416.

The Reverend Thomas Bayes is generally credited with originating the important theorem in statistics that bears his name. Mathematically, Bayes’ law shows that by updating initial beliefs with new objective information, revised and improved beliefs emerge. Though long controversial, Bayesian methods are now generally accepted in the statistical world, much to the benefit of business learning.

Less known about Bayes was his promotion of an early, computer-less version of stochastic simulation to test his theorem. He first imagined a level square table where a thrown ball would have an equal chance of settling on any spot. An associate then would toss a ball on the table, which Bayes wouldn't see because his back was turned. A second ball would then be thrown by the associate, who reports only whether the new ball landed to the right or left of the original.

After the colleague has made many such throws, noting for each where it lands with respect to the first, “Bayes could narrow the range of places where the cue ball was apt to be.”  Each toss of a new ball restricts the conjecture of where the original ball rests to “a more limited area … Bayes' genius was to take the idea of narrowing down the range of the positions for the cue (original) ball and – based on this meager information – infer that it had landed somewhere between two bounds. … Bayes would never know where the cue ball landed, but he could tell with increasing confidence that it was most probably within a particular range.”

With this exercise, Bayes conceptually completed a stochastic simulation 300 years before the methodology was popularized. He had carefully articulated an input domain. He had randomly drawn inputs from a uniform distribution. He had computed the output. And he had repeated the process many times.

Next week we'll look at several of the SS techniques currently popular in statistical science, arguing that these methods can help BI practitioners understand difficult concepts.