Every once in a while, my Facebook meanderings lead me to a fruitful analytics destination. And so it was last week when I stumbled across an interesting article: “Improving the Teaching of Econometrics”, by university professors David Hendry and Grayham Mizon.

It's been quite a while since I studied econometrics, but as I've glanced at leading texts over the years, I've been struck by how similar the field remains to the one I was exposed to 35 years ago. Back then, econometrics revolved on progressively complex statistical formulations, generally variants of linear regression, purporting to estimate and test progressively complex economic models.

As the economic theories became more involved, the statistical models representing them had to become more complex as well, the assumptions surrounding simple linear models often unable to satisfy the greater demands. Examples of such recalcitrance include “serial correlation” and “endogeneity” that vitiate use of standard linear models since assumptions about the behavior of error terms cannot be met. In addition, much work in econometrics centers on determining the “robustness” of techniques when strict model assumptions are untenable. Under what circumstances are statistical techniques impervious to violations of their underlying assumptions?

For my practical analytics tastes, econometrics remained too doctrinaire in its top-down mathematical formulation for evaluating economic theories, especially in light of the emergence of computation and simulation/sampling techniques such as the bootstrap in academic statistics.

Indeed, in an interview seven years ago,  eminent Stanford statistician Brad Efron opined on the divide between the evolving data-driven statistics discipline and theory-driven econometrics: “Statistics has enjoyed modest, positively sloped growth since 1900. There is now much more statistical work being done in the scientific disciplines, what with biometrics/biostatistics, econometrics, psychometrics, etc. – and business as well. Statistics is now even entrenched in hard sciences like physics.

There are also the computer science/artificial intelligence contributions of machine learning and other data mining techniques. If data analysis were political, biometrics/econometrics/psychometrics would be “right wing” conservatives, traditional statistics would be “centrist,” and machine learning would be “left-leaning.” The conservative-liberal scale reflects how orthodox the disciplines are with respect to inference, ranging from very to not at all.

(Incidentally, I can't wait to get my hands on the soon-to-be-published book, Computer-Age Statistical Inference, by Efron and Stanford colleague Trevor Hastie.)

In a blog from five years ago, Harvard economist James Greiner seemed to agree with Efron, noting that “economists tend to focus more on parameter estimation, asymptotics, unbiased-ness, and paper-and-pencil solutions to problems (which can then be implemented via canned software like STATA), whereas applied statisticians are leaning more towards imputation and predictive inference, Bayesian thinking, and computational solutions to problems (which require programming in packages such as R).” In short, econometrics obsesses with the underlying mathematics, while statistics focuses more on computation and simulation.

Hendry and Mizon should be considered statistical moderates rather than revolutionaries, espousing not a shift from top-down theory to bottom-up data-driven approaches, but rather a hybrid that starts with top-down and engages bottom-up as needed.

“We discuss how we reached our present approach, and how the teaching of macro-econometrics, and econometrics in general, can be improved by nesting so-called ‘theory-driven’ and ‘data-driven’ approaches. In our methodology, the theory-model’s parameter estimates are unaffected by selection when the theory is complete and correct, so nothing is lost, whereas when the theory is incomplete or incorrect, improved empirical models can be discovered from the data,” the authors wrote.

In other words, go with theory-driven until it proves incapable of meeting the needs, then embellish with data-driven. I like – think theory first, then data.

Much of the article body is consumed with econometrics intricacies – autocorrelation, non-stationarity, non-linearity, endogeneity, parameter constancy – that may bore the average reader. I did take to the “encompassing” principle in which tests are conducted on “whether any model can account for the results of the alternative models and so reduce the set of admissible models, and in addition reveal the directions in which a model under-performs relative to its rivals.” That seemed a nice foundation for the theory/data-driven synthesis the authors are espousing.

At the same time, my data-driven side was a bit shocked by a 2015 citation that “the notion of empirical model discovery in economics may seem to be an unlikely idea, but it is a natural evolution from existing practices”. Yikes. Bottom-up machine learning model generation has been a staple of other areas of statistical science for over ten years now.

In the end, the authors' conclusions resonated well with me. “Astute readers will have noticed the gulf between the focus of our paper on concepts and model formulations, as against the usual textbook sequence of recipes for estimating pre-specified models.

Appropriate estimation techniques are certainly necessary, but are far from sufficient if the model in question is not well specified. Since economic reality is complicated, pre-specification is unlikely to be perfect, so discovering a good model seems to be the only viable way ahead.”

Whatever it takes to get the “best” model.