I just completed campus interviews for the fifth consecutive year. It was a lot of work meeting with over 30 candidates from three top universities over a two week period. But it's a labor of love that will hopefully help entice several top graduates to join OpenBI.
Two of the candidates I met were double majors in statistics and economics, both of which are pertinent for BI. So a natural question for me to ask was on the difference between statistics and econometrics, the statistical science supporting economics. The answers the students gave were pretty good, observing that econometrics was focused on regression models for time series-like observational data, while statistics revolved more on sampling and the conduct of experiments.
As a student many years ago, I found econometrics to be much more limited in focus than statistics – too limited, actually. While both had foundations in rigorous mathematics, econometrics seemed in the service of testing grand economic theories and estimating their parameters using linear models on observational data.
Where the assumptions of those models weren't met by the nature of the data-generating process, econometrics “fixed” the problems so that the underlying models were again relevant. Statistics used linear models as well, but had much more of an emphasis on sampling and randomized experiments than econometrics. Statistics also appeared to adapt quicker to the emerging power of computation and simulation. And connecting the dots to probability was easier for statistics than it was for econometrics. All told, statistics was more nimble and applicable for me than econometrics.
Eminent Stanford statistician Brad Efron would probably agree, at least partially, offering the following observations in a conversation three years ago: “Statistics has enjoyed modest, positively sloped growth since 1900. There is now much more statistical work being done in the scientific disciplines, what with biometrics/biostatistics, econometrics, psychometrics, etc. – and business as well. Statistics is now even entrenched in hard sciences like physics.
There are also the computer science/artificial intelligence contributions of machine learning and other data mining techniques. If data analysis were political, biometrics/econometrics/psychometrics would be 'right wing' conservatives, traditional statistics would be 'centrist', and machine learning would be 'left-leaning'. The conservative-liberal scale reflects how orthodox the disciplines are with respect to inference, ranging from very to not at all.”
Harvard statistician James Greiner adds “So what is the difference between an empirical, data-centered economist and an applied statistician? The stereotypes I've internalized from hanging out in an East Coast statistics department are that economists tend to focus more on parameter estimation, asymptotics, unbiased-ness, and paper-and-pencil solutions to problems (which can then be implemented via canned software like STATA), whereas applied statisticians are leaning more towards imputation and predictive inference, Bayesian thinking, and computational solutions to problems (which require programming in packages such as R).” In other words, classical econometrics is even more consumed with the underlying mathematics than is statistics.
The late, “left-leaning” Berkeley statistician Leo Breiman would certainly take econometrics to task for being more “right-wing” than statistics. In his provocative, 2001 article “Statistical Modeling: The Two Cultures,” Breiman is highly critical of mainstream statistical practice for its obsession with mathematical data models that have “led to irrelevant theory, questionable conclusions, and have kept statisticians from working on a large range of interesting current problems.” In contrast, leftist algorithmic or statistical learning “can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets.” For Breiman, cross-validated predictive accuracy is the most important criterion for evaluating the success of modeling efforts – and favors leftist, statistical learning approaches.
So is econometrics a dinosaur that has become irrelevant in an era where computation and prediction are supplanting mathematics and parameter estimation in data analysis? Fortunately, the answer is no, and the reason is that economics research itself is changing.
In a highly informative article, “Random Walks by Young Economists,” Princeton professor Angus Deaton discusses the changing face of graduate economics education. His telling observation? “If the typical thesis of the eighties was an elaborate piece of price theory estimated by non-linear maximum likelihood on a very small number of observations, the typical thesis of today uses little or no theory, much simpler econometrics, and hundreds of thousands of observations.” Moreover, “there is a fast growing effort to replace econometric methodology, which can be thought of as a set of ex post fixes for non-experimental data, with real experiments, which require no such fix.”
In other words, econometrics, with its charter to support mainstream economic research, is adapting with the discipline to a data-driven, experimental and compute intensive world. That's good news for BI.