A recent article in The Economist, entitled “The Disposable Academic: Why Doing a Ph.D. is Often a Waste of Time,” questions the relevance of the Ph.D. degree for many who now invest the 5+ years to secure it.
The author notes that demand in academia for Ph.D.s hasn’t kept up with the supply, leading to a glut in many disciplines. And while the earnings premium for a Ph.D. over an “average” bachelor’s degree is 23 percent per year, the differential over a master’s is just 3 percent, with the master’s student having three or more head start income-producing years.
Attempting to paint all Ph.D.s with the same brush, though, is an illustration of “flaw of averages” thinking. The sour prospects that might await newly minted French history grads are probably different than those for physics students, even though the latter might never get to “profess” in their area of study. And don't tell recent Stanford and Berkeley Statistics Ph.D.s who choose commerce over academia their degrees weren't a good investment. Rather than seek out academia, many of those students simply take their craft to the data companies of Silicon Valley for very healthy compensations, thank you.
Still, I think there's merit to the argument that, for many, the additional three years in academia to secure the Ph.D. might not be worth it. A terminal M.S. could make the most sense for those looking to maximize career earnings. So for those who share that view, let me offer a curriculum for a master’s that seems an ideal background for the practice of analytics in business.
My proposal for an M.S. in Applied Statistics would span two years, with three courses for each of the four semesters. Enterprising students might complete the program in three semesters. Prerequisites would include a year of calculus, a semester of linear algebra, a year of probability and statistical methods and a course in programming with Java or C++.
The program would look something like the list below, with 11 required classes and one elective from an extensive list. The background data and programming courses would include intermediate Java (or, better, agile data programming with Python or Ruby), as well as a practical database design/SQL course with MySQL or Postgres. Students would be able to place out of these courses, potentially freeing up more slots for electives.
The guts of the curriculum would be the ambitious one year sequence in probability and statistical methods which would include just about everything core to the toolchest of applied statisticians. Among the topics covered would be univariate statistics, exploratory data analysis, and all aspects of linear models – anova, ancova, regression, cubic splines, binomial, random, mixed, survival and robust. Though the theory and mathematics behind the methods would be presented, more emphasis would be given to applications with meaningful data sets using statistical software such as R. The mix of presentation would be similar to that in the seminal text, “Modern Applied Statistics with S” by Venables and Ripley.
Coursework in experimental design, time series/forecasting and multivariate analysis would amplify material introduced in the methods classes with more detail, providing students the advanced tools they need to handle many of the everyday challenges of messy analytics data.
Statistical learning and computational statistics would further distinguish this curriculum from the traditional mathematical motif of many existing stats programs. Both courses would have a computer-intensive focus – SL with the latest modeling techniques for prediction, clustering and association developed at the juncture of statistics and computer science; CS with visualization, computer-generated sampling and approximations, simulation, Monte Carlo techniques, the bootstrap, permutation and cross validation. Students would learn the new methods by using both R and Matlab in their classwork.
The class in Bayesian statistics particularly hits the mark in 2011. The Bayesian approach provides the ideal linkage from pure statistics to decision making/learning in the business world. The time is now to get serious with Bayes.
The statistical consulting course would source from practical work in a consulting lab, encouraging students to integrate their classwork with the practical research challenges brought to them in the lab. Their facility with programming and data management might prove as important as statistical prowess.
Students would get to choose their final course(s) from the statistical offerings of other disciplines, including biology, engineering, computer science, business, epidemiology, economics, social sciences (political science, sociology, educational psychology), finance and applied mathematics. The expectation would be that the classes would follow a common theme.
The proposed curriculum reflects my own experience and biases, but I think it'd be a great start for budding business analytics professionals. And in the data deluge era, I could see a program like this turning the customary BS-MBA model on its head for prospective business analysts: Rather than going from quantitative undergrad to business grad, how about from business undergrad to statistics grad?
- Intermediate Programming with Java or C++ (or Agile Data Programming with Python or Ruby)
- Relational Databases and SQL
- Probability and Statistical Methods I
- Probability and Statistical Methods II
- Statistics and Experimental Design
- Forecasting and Time Series Methods
- Multivariate Statistical Methods
- Elements of Statistical Learning
- Computational Statistics
- Bayesian Statistics
- Statistical Consulting
- Industrial Engineering Statistics
- Business Statistics
- Actuarial Science
- Social Science Statistics
- Health Care/Epidemiology
- Financial Engineering Statistics
- Numerical Methods/Mathematical Optimization
- Data Mining/Visualization
Statisticians, BI analysts and business leaders, what do you think? I'd love your feedback.