Click here to read Part 1.

I recently downloaded the latest stock portfolio returns from the website of Dartmouth professor Ken French. The data is a month behind, so October 2009 was just added, but the 46 years of daily returns more than makes up for the inconvenience. I regularly use the Fama/French indexes to track market performance and provide insight on where I should put my meager investment dollars, though by the time I make up my mind, the “hot” portfolios have generally cooled.

Once I collated the data from several files, I ended up with returns from 7/1963 through 10/2009, a total of 11,665 daily change figures for each portfolio. The portfolios constructed by French and colleague Gene Fama include a Risk Free index of 3-month T-Bill bills, a Market index similar to the Wilshire 5000, and 2 size by 3 value classifications, producing Small Growth, Small Neutral, Small Value, Large Growth, Large Neutral and Large Value portfolios. I used the R statistics package to visualize and analyze the 8 returns vectors.

Recall from last week that the Monte Carlo method is characterized by:

  1. A defined domain of inputs
  2. A process for generating inputs randomly from this domain using a probability distribution
  3. A calculation on the generated inputs
  4. An aggregation of the calculated results across a large number of iterations.

But whereas the problems we examined then used probabilities as the input domain, those we look at here use a sample of data as the point of departure. We then repeatedly resample uniformly with replacement from this sample as if it were the population, calculating and storing statistics from each resample, ultimately aggregating the results and divining a distribution. This approach of repeatedly resampling with replacement from a given sample over a large number of iterations is called the bootstrap. The analyst in this case lifts herself up by her bootstrap – making do with what she has, using a sample as a proxy for the population when the latter's not available. The sample serves as a plug-in for the population.

As an illustration, suppose we wish to examine the distribution of 1 year returns from the Market and Risk Free index portfolios. We sample with replacement 252 (the average number of trading days per year) of the 11,665 available Market and Risk Free index returns and compute the growth of an initial $1 investment. We repeat this calculation, say, 100,000 times, storing each result. We then summarize the calculations in percentile distributions. In my latest 100,000 trial “experiment”, a $1 investment in the Market portfolio at the beginning of the year progressed to between a low of $.33 and a high of $2.17 at year's end, while the Risk Free figures were between $1.05 and $1.06 – reflecting their low risk/low reward profile. The Market index is in the red almost 28% of the time and lags Risk Free in 40% of the calculations at 1 year, confirming the investment admonishment that equities are not for the short term. 95% of the 1 year bootstrapped Market growth of $1 investments are between $.80 and $1.48.

Similar calculations for 5, 10 and 20 years across the various portfolios illuminate the value of equities for the long run. For example, at 5 years, an initial $1 in the Market portfolio grows to between $.79 and $3.10 in 95% of cases; at 10 years, the 95% numbers are $.93 and $6.47; and at 20 years, $1.56 and $23.97. That is, if the 46 years of daily returns is a reasonable proxy for the long term “population”. Warren Buffet and John Bogle might not agree. And while I'm not completely comfortable that this sample accurately “plugs-in” for go-forward returns, I certainly feel better about this approach than I do with the the advice of the latest stock-picking gurus.

Just as bootstrapping uses the Monte Carlo method to resample with replacement from an available sample, permutation tests resample without replacement. An excellent document by Hesterberg, et. al. clearly illustrates the differences between the approaches, with comprehensive easy-to-understand examples.

Suppose we conduct a marketing experiment that yields treatment (campaign) and control (no campaign) measurements like those in Table 14.3 of Hesterberg. To perform a permutation test with this data, we first compute the difference in means between treatment and control, in this case,  51.476 − 41.522 = 9.954. We then assume under the null hypothesis that the difference is in fact zero, pooling all 44 observations into a single group from which 21 are resampled without replacement as treatment, the remainder as control. We iterate on this resampling thousands of times, in each case computing and storing the difference in “treatment” and “control” means. We finally contrast this resampled distribution of mean differences with the actual to determine if the null hypothesis is indeed plausible give the observed difference of 9.954.

I wrote a little test script in R, running and accumulating results for 100,000 permutation iterations. When I then constructed the quantiles of the 100,000 computations, I found only 2% in excess of 9.954. We thus reject the null hypothesis that the difference in the treatment and control means is zero. The evidence suggests that the treatment mean exceeds the control.

Resampling techniques such as the bootstrap, permutation tests, the jackknife and cross-validation are important additions to the BI analyst's tool chest, providing the means to estimate parameters, test hypotheses and validate models using computing power rather than the often implacable mathematics. As noted by Brad Efron, originator of the bootstrap, “the methods we discuss....require very little in the way of modeling, assumptions, or analysis, and can be applied in an automatic way to any situation, no matter how complicated....An important theme is the substitution of raw computing power for theoretical analysis.” Much more the practical analyst than the theoretical mathematician, that works well for me!

Steve Miller also blogs at miller.openbi.com.