This column is in answer to our last column on Chris Anderson’s thesis supporting massive data correlation over the scientific method. For an informed background, please read that column or Anderson’s Wired article.

I received quite a lot of feedback from my last DM Review column. Most of the comments dealt either with causation versus correlation or argued that statistics are but one tool in the scientific workshop, not an alternative. I’d say the net reaction was even overall; several readers were taken with Anderson’s thinking, others were not impressed and a few were genuinely upset. (Anderson has taken a lot of heat on the Web, some of it reasoned and the rest a digest of angry flaming.)


You’ll find many such arguments on the Web. To take a slightly different tack I turned primarily to an individual, neither a statistician nor a scientists, who has his own take on both sides of the argument.


Doug Glanville spent nine successful years in major-league baseball and made a good deal of money playing mostly for the Chicago Cubs and Philadelphia Phillies. Doug is writing a fascinating series of columns in the New York Times this summer dealing with life pre and post-baseball. (Glanville was doubly gifted to the point of being doubted by scouts who worried that he cared too much about academics while playing ball at the University of Pennsylvania; today he works as a property developer, writer and baseball consultant and will appear in an upcoming Ken Burns series. I encourage you to read his columns.)


I couldn’t find a more human response to Anderson’s number-centric world and his fixation on the statistical methods of Google. I sent Glanville references about the debate and asked about his latest column in The Times, “Doubleday and Darwin,” in which he examined the combination of athletic tools that brought him a big signing bonus and made him a first round draft pick in 1991.


There were parallels: both Anderson and Glanville had referenced Charles Darwin and the power of observation without having met. I thought it was a fascinating coincidence for two “observers” with largely polar points of view.


Where they diverge, Anderson looks at the future value of statistical correlations while Glanville spent a career mostly subject to such observations. (Anyone who follows the game or has read "Moneyball" will understand baseball’s obsession with statistics and how they are used to pick out “can’t miss” prospects.)


“Some of what I think about are the rules, or perceived rules, of the game,” Glanville told me. Gifted with blazing speed and the hand-eye coordination of a contact hitter, Glanville pointed out that, had the outfield walls been eliminated or the distance between bases shortened, he’d be a sure bet for baseball’s Hall of Fame.


“But the rules aren’t supposed to change, right?” Glanville told me. “In fact, they change very quickly and all the time.” As he pointed out, there was the lowering of the pitcher’s mound when star pitchers such as Bob Gibson were consistently dominant in the 1960’s; there was a simple pitching adjustment called the “slide step” that put an end to a long era of base stealers stretching from Lou Brock to Ricky Henderson; there was a decision in recent years to expand the strike zone when home-run hitters were setting new records season after season that were out of skew with the game’s history.


The last change was made even without knowledge about what would become the steroids controversy and yet another rules change.“Today, the PGA is trying to “Tiger-proof” golf courses and there’s a reason that’s a term,” Glanville continued. “It skews the statistical comfort zone of the inventors and that’s generally unacceptable. Tiger Woods comes along and raises the bar and adds new value to the game but in the long term your statistics are all thrown out of whack.”


Business dynamics are no different, though they may be less transparent. You might think “Tiger-proofing” would argue in favor of Anderson’s view of the limitations of hypotheses, but in reality, it’s about actions and reactions that statistics cannot account for outside of a snapshot.


In Glanville’s subjective world, he’d have done better to steer his behavior toward a correlative model that affirmed the measures of superior performance. But even this was out of his control, as he related an arcane Cubs ranking system that once ranked him highly when he was actually performing poorly by traditional baseball measures.


There is evidence to suggest that some people are already “gaming” statistics, one area being financial markets where quantitative hedge fund managers carefully hide their trading algorithms. “There are ‘lurkers’ who are out there trying to second guess the algorithms of others,” says Ian Finley, an analyst at AMR Research. “The guy who created the algorithm might not make any money, but the guy who follows the money very well could.”


By contrast, science prefers proofs that are consistent or reveal transitions over time. In Anderson’s realm of biological discovery, Doug Glanville would have hit near his career average of .277 for nine years, or trended upward or downward during those years. That would not account for 1999, when Glanville hit .325, led the league in singles and was second in hits overall, despite the onset of a series of serious illnesses affecting his father, which might have been assumed to hurt his performance. You’d hate to see that year rejected as a statistical outlier for any number of reasons, yet it damns both the qualitative and quantitative side of knowledge. “I’d like to be able to explain some of the anomalies,” says Glanville. “I still don’t know why I could never hit anything at Turner Field in Atlanta.”


It might be that things that are temporal or situational by nature have less lingering interest to scientists, but I wouldn’t say that’s a rule. I tend to agree that it was the title and tone of Anderson’s work and the absolutism he applied to correlations that undercut what he had to say, and pitted scientists against statisticians (not all of whom are eager to join the fight). I still believe he had something important to say, and in my conversation with him he put limits on the value of generalization. It was interesting to see that people working with Web analytics  who responded to my column were the most optimistic about prospects for building pure correlative business advantage. We’re always interested when people vote with their wallets, but other dynamics are always at work on the Web.


I asked DM Review contributor Tom Davenport about this and he says he’s seeing a much more mixed-approach methodology to Web analytics even as we speak. “I saw a presentation not too long ago from the people at e-Bay and it’s just mind-boggling how much qualitative research they do in addition to the quantitative work.”


Glanville recalled in his latest column a question he had been challenged with as a high-school student. “Our teacher asked us, ‘Was math discovered or was it invented?’”


When I asked for an answer he offered me this: “In the game, which I believe is a reflection of life itself, things come along and things are phased out. Numbers come along and create an environment that can support your point of view. I love numbers and believe they are significant. But you can’t appreciate the subtleties of individual performance until you are on your field, and there are simply too many things going on out there to keep track of. We also want to agree why the Arctic is melting and we have some compelling information but we don’t yet have a proof for that either. I’ll stick to the best I could come up with for an answer to my teacher. I’d say that math was discovered through the lens of our invention.”


There is much value to be found in statistical correlations, but for whatever we are learning, we’re still creating hypotheses for things to be true as we'd like them, and looking for statistics to support that desire. Whether that’s an indictment of statistics or qualitative thinking, it qualifies the confidence we should assign to empirical observation inside or outside the scientific laboratory.


Note: Some points of view could not be included in the space of this column. Sincere thanks nonetheless to the many folks passed along thoughtful comments.


Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access