MAY 21, 2008 2:48am ET

June 4, 2014

# The BI Ensemble, Part 2: Bagging, Boosting and the Wisdom of Crowds

The Bootstrap

I vividly remember one of the times as a teenager I received a dose of tough love from my parents. I had just spent two consecutive miserable days looking for a summer job without a trace of success. At dinner, I shared my frustrations with mom and dad, seeking solace. They listened intently, nodded approvingly and were seemingly sympathetic - but then dropped a parental bomb. “Well, you gotta get used to it. Life’s tough. Tomorrow’s another day. You should try a different approach. Pick yourself up by your bootstraps.”

I’d never realized my folks were so literary - familiar with Adventures of Baron Munchausen, by Rudolph Erich Raspe, in which the baron saves himself from drowning in a deep lake “by picking himself up by his own bootstraps.” Apparently, the parents of many budding statisticians were just as well read, for the metaphor of the bootstrap is pervasive in statistical theory and machine learning today.

Though the bootstrapping techniques of the statistical world can be quite complicated, the basic concepts are remarkably simple. Statisticians are generally interested in making intelligent statements about a population when they have to make do with a fractional sample of that population as data points. An obvious example is a voting population that’s sampled by a poll to predict an election outcome. Historically, statisticians have worked with strong assumptions about the probability models underpinning the population and then used arduous mathematics to derive conclusions about the adequacy of sample statistics. If the population looks like “such-and-such,” then sample statistics that look like “this-and-that” will do a reasonable job estimating population characteristics.

The bootstrap, however, uses a combination of random sampling and computer finesse to bypass the often intractable mathematics relating population to sample, encouraging statisticians to “lift themselves up by their bootstraps” in their quests to learn about the population. As a starting point, the bootstrap assumes the given sample is a reasonable approximation of the underlying population. Rather than derive results with math, the bootstrap uses the computer to conduct many random resamples with replacement on the original data, calculating and tracking a distribution of relevant statistics that shed light on the population. That original sample thus assumes the role of the population, and computer simulation is used to generate independent samples from that population. With statistics computed from a large number of these simulations, the resulting distributions often have desirable properties that are useful for making strong inferences about the real population. In fact, the bootstrap has performed so well that its use, along with other computer simulation techniques, is now de rigueur in the statistical world. Prudent business intelligence (BI) analysts are wise to consider adding the bootstrap to their tool chests as well.

Statistical Machine Learning

The BI world is the fortunate beneficiary of exciting collaboration in the academic world of statistics, computer science, decision science, operations research and the social and physical sciences. The statistics department at the University of California, Berkeley, with a reputation among the the best in the world, is helping to lead the charge in the field of statistical machine learning with a research mandate that “is driven by applied problems in science and technology, where data streams are increasingly large-scale, dynamical and heterogeneous and where mathematical and algorithmic creativity are required to bring statistical methodology to bear.”1

The approaches of learning from data can be categorized as either supervised or unsupervised. With supervised learning, the goal is to predict a known outcome measure from other input attributes. Once models are “trained” from outcome and input attributes of the past, they are tested with new input measures to determine how well they perform as future predictors. Examples of supervised learning candidates include fraud detection or credit worthiness in finance, up-sell/cross-sell in retail, recidivism with substance abusers and churn prediction in telecom. Unsupervised learning, on the other hand, has no outcome measure to predict. Rather, its intention is to describe the patterns of associations and correspondence among the input measures.2

There’s no shortage of statistical machine learning approaches to developing supervised predictive models. Of course, there are the traditional, statistical linear regression and classification models in addition to multivariate techniques such as linear and quadratic discriminant analysis. There are also classification trees and forests, as well as Bayesian and nearest-neighbor methods. Then there are the kernel methods, neural nets and support vector machines. Each of these attempts in some manner to optimize the accuracy of predicting outcome measures from inputs.3

Bagging

The results of many of the previously discussed models can often be enhanced by the finesse of statistical approaches. Ensemble methods refer generally to the approach of improving model performance by either iteratively bundling disparate models or by averaging the models produced from multiple passes of the data.4 Ensembles thus introduce crowd concepts to machine learning methods.

Bagging, or bootstrap aggregating, uses bootstrap concepts to resample training data many times, ultimately producing a unified model from an “averaging” of results of the individual computations. These averaged or bagged predictors are often superior to those from a single data pass. Research shows that bagging produces the largest gains in performance when individual bootstrap samples lead to wide variation in prediction coefficients. The averaging of these disparate predictors may produce enhanced predictions.5

Boosting

A second ensemble technique that is used to improve the accuracy of predictions of a single algorithm is boosting. The boosting methodology is built on the premise that while it’s generally very difficult to build a single, highly accurate decision rule, it’s often easy to come up with simple rules of thumb - weak learners - that have at least some accuracy superior to random guessing. A gambler attempting to pick the results of professional football games might, for example, use team records in the last five games and key personnel injuries as starter weak learners.

All Information Management articles are archived after 7 days. REGISTER NOW for unlimited access to all recently archived articles, as well as thousands of searchable stories. Registered Members also gain access to:

• Discounts to upcoming conferences & events

Filed under:

Be the first to comment on this post using the section below.