The annual NCAA basketball tournament - now at its height - has long been a magnet of sporting interest for people from all walks of life. Tapping this vein of awareness, a couple of professors have been working in recent years to channel the excitement of March Madness as a tableau for educating students and businesspeople about the way analytic models really work and their power to predict.
Both professors Jay Coleman of the University of North Florida and Allen Lynch of Mercer University had been using SAS-driven analytic engines since their college mainframe computing days. Both happened to be sports fans. Much later on, Coleman, an associate dean, and Lynch, who teaches economics and quantitative methods, got together to collaborate on the Dance Card, a weighted mathematical formula that seeks to predict the tournament choices made by the selection committee. Coleman started the project after coming across a stat-driven site, collegerpi.com, run by a fellow named Jerry Palm. "He had a fairly extensive data set covering 1994 to 1999 that showed all the data on teams that got bids to the tournament and a lot of the teams that didn't but were possibilities," Coleman says. "I took one look at that as an academic, to whom a data set is a gold mine, and said we should do something with that." With Palm's cooperation, Coleman wrote code to collate data for subsequent years and, along with Lynch, created a model to predict at-large bids (and later, individual game outcomes). This year, the model correctly predicted 31 of 34 or 91 percent of at-large bids to the tournament, based on a set of variables and filters that can be found at DanceCard.unf.edu. That is as poorly as the model has done; in its best years, it has missed only one pick.
With no money to be made in selection choices, you'd eventually ask why so much work has gone into the project. "In a general way, we tell folks that the leap from a model that predicts tournament entry successfully to a model of whether a store or a product launch will be successful is a short one," Lynch says. "Our model for at-large bids predicts the decision of a committee making a collective choice, a big-money business decision, which is what businesses do at high and low levels every day." When one considers that these manual one-off decisions involve considerable man-hours and expense, the modeling proposition (which can execute in seconds and repeat endlessly) makes sense, or at least some sense.
Not many businesses would stake a high-level decision on a mathematical model, but predictive models can provide a valuable input to offset the "tribal knowledge" of assumptions. A misconception of predictive models is that they need to be taken as pure truth, when in fact, they serve best in decision support. In this regard, Coleman chides college football's Bowl Championship Series committee for picking top teams purely on empirical data, unlike the NCAA basketball committee, which uses a variety of inputs.
The Dance Card model does not account for a sudden key player injury or suspension, but the six consistent variables it does use gives lie to conventional wisdom. In basketball for example, "talking heads" are convinced that teams that are "hot" going into the tournament get some sort of favoritism from the NCAA committee or do well once the tournament starts. "The record over the last 10 games is talked about, but in reality our analysis shows hot teams don't tend to get preference from the committee, and they don't win more in the tournament," Coleman says. It's an illustration of a phenomenon that goes on in organizations everywhere, that "conventional wisdom is not necessarily wisdom."
Allen can relate, having worked for a time with the real estate team that tried to identify the best new store sites for the Blockbuster video chain. "They'd go out and 'kick the dirt,' look at grocery stores nearby to see what kind of customers were there and what products were selling," he says. Though Allen's group never visited the sites, behind the scenes a crew of "data geeks" ran models of actual data. "Occasionally we'd find things in the data that the real estate reps would never believe were related based on their experience." At other times, the data crew ran numbers to support or refute trends that appeared to be taking place. In this way the models provided decision support. "At the end of the day, folks who mine data have to work as a complement to other methods of decision-making to get the right answer," Allen says.
Still, predictive models like the Dance Card can feel torturous. Coleman and Lynch picked Syracuse to be invited to the tournament just before selection, but admitted they were on the "bubble" for choice by the selection committee. A last-second shot in the conference tournament to defeat number one Connecticut later assured Syracuse a spot, but such an event feels random and reminds us of the limits of a model. But don't forget that the Dance Card is trying to predict what the committee will do at a given moment, not the chances of a desperation jump shot being made or not. Until the shot, past patterns meant the committee would look at previous losses to Connecticut and Villanova, two of the top three teams. After the shot, the model could be run again with a different result. "That's sort of the randomness that happens and has to be taken into consideration along with the metrics," Coleman says. While the committee does take extraneous data into consideration, the model sticks to its business, and continues to work with a high degree of accuracy.
Improving that accuracy rests with the analyst team and the ongoing search for mix of variables that deliver the most reliable result. As Lynch says, within all the numbers is a case study of what works and what doesn't; the goal is to find ways to get the data to deliver a story. "I tell students you have to torture the data until it tells you the truth. There are real stories embedded in data, and businesses have to overcome their fear of techniques associated with data mining. They have to start harnessing the value through analysis."
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access