For kicks, I've generated random data to use for illustration. In my “campaign”, there are 400 customers exposed to the promotion treatment, with 500 in business as usual. The promotion group spends $206.85 on average; the control $195.49. When I apply a standard t-test to the data, I reject the hypothesis that the average spending is the same for each group – assuming of course that stringent assumptions about the collected data are valid. In this case, the evidence suggests that the promotion group spends more. If I'm VP of Marketing, I'm happy.
With traditional statistics, a theoretical population with assumed mathematical underpinnings is the point of departure for a sample taken to test an hypothesis. If that sample is well behaved, than inferences can be derived. If the sample doesn't look like the theoretical population, the statistician investigates whether the tests are “robust” with respect to the discovered anomalies. Best case is she plugs data values into arcane mathematical formulas that in turn spit out p-values. This process is, for many like me, frustratingly distant and black box.
In contrast, one of the more exciting developments in statistical science over the last 30 years is the emergence of resampling stochastic simulation methods. Resampling techniques allow practitioners to experience the randomness and variation in their data first hand. Rather than obsessing on populations and math, analysts fret over real data and computing. Indeed the big conceptual leap with resampling is a focus on the data you have – to treat the sample as the population. Resampling then “samples the sample” many times, deriving statistics that can shed light on how the data's distributed. It's not uncommon to have 50,000 or more replications of resampled computations. Those statistics are stored and later used to showcase a sampling distribution.
Consider permutation testing. In my generated experiment, I wish to test the null hypothesis of no difference in the promotion and control spend means. To do this using PT, I first stack the 400 promotion sales observations on top of the 500 controls. I then randomly sample without replacement 400 of the 900 data points and compute the mean. Note that the samples are from all observations, with no distinction between the original groups. I compute the average for the remaining 500 observations, ultimately calculating and storing the difference between the two. Under the implied hypothesis that the means are equal, this difference should be symmetric around zero. I repeat the process 50,000 times, after which I graph the computations, which show what the distribution of differences in the means might look like if in fact there was no difference. In addition, I indicate on the graph where the real mean difference lies.
Figure 1 displays these computations for the generated data in a density plot.
Notice the variation in the mean difference values, even though the data were generated in such a way as to assure no statistical difference. Because the actual mean difference exceeds all but a very small percentage (<1%) of those computed under the hypothesis of equality, we reject the hypothesis that promotion and control means are identical. The Marketing VP's euphoria is confirmed.
A more flexible SS resampling technique is the bootstrap, so named because analysts “bootstrap” or make due with what they have of sample data in the absence of knowledge of the population. Once again, the sample becomes a proxy or “plug-in” for the unknown population. Unlike with permutation testing though, the bootstrap samples with replacement.. So a given datum might be represented more than once in a resampled calculation.
For my generated data, I randomly sample with replacement 400 observations from the promotion group data and a separate 500 observations from the control. I then compute means for both these promotion and control resamples, next calculating the difference. I repeat this process 50,000 times and store the results. The distribution of the differences in this case should center on the actual sample mean difference. If the means are in fact the same, zero should be a prominent value in the density. Figure 2 details the results.
That most of the bootstrapped differences (99.9%) are larger than zero affirms that the promotion has indeed encouraged more spending. It's comforting to know that the permutation and bootstrap simulations yield similar results with this data.
In addition to being invaluable as methods for day-to-day statistical practice, resampling SS techniques can also be a boon to those attempting to learn statistics. Rather than distant and mathematical, resampling methods can make statistical concepts near and practical. In his excellent article “Simulation and Bootstrapping for Teaching Statistics,” Tim Hesterberg opines: “Many of the basic ideas in probability and statistics seem exceedingly difficult for students to grasp ... The traditional road to statistical knowledge is blocked, for most, by a formidable wall of mathematics ... One major problem is that students have difficulty with fundamental concepts involving randomness ... Direct experience and actual experimentation is the best way the student can get a feel for these concepts ... with computational techniques that they can use to analyze and understand complicated data sets.”
Hesterberg touts the use of interactive, graphically-inclined statistical languages like S+, R and Matlab rather than the batch-oriented packages SPSS and SAS to promote the usage of resampling. He cites eminent Stanford statistician Brad Efron: “My guess is that bootstrapping (and other computer-intensive methods) will really come into its own only as more statisticians are freed from the constraints of batch mentality processing.” I certainly can vouch for the powerful combination of simulation techniques supplemented by advanced statistical graphics.
For those interested in learning more about resampling methods, I'd recommend another Hesterberg piece, “Bootstrap Methods and Permutation Tests.” A subsequent blog will revolve on additional stochastic simulation techniques in statistics, especially surrounding Bayesian analysis.