Business Analytics and Forecasting: Revisited
I just finished reading “Forecasting: principles and practice”, by Monash University, Australia, academics Rob Hyndman and George Athanasopoulos. I think FPP is a terrific resource for predictive analytics/data science practitioners, achieving its objective to “provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly.” A significant collateral benefit is the accompanying R package, forecast, that implements the covered procedures. An additional “fpp” package houses data used in examples. All for free.
An important objective of forecasting is “to inform decisions about the scheduling of production, transportation and personnel, and provide a guide to long-term strategic planning.... [Forecasting] is about predicting the future as accurately as possible, given all of the information available, including historical data and knowledge of any future events that might impact the forecasts.”
Forecasting is central to the work of business analytics. In a blog four years ago, I cited Forecasting and Time Series Methods as core to an applied statistics curricula. And while that curricula would be a little more data and programming heavy if I were to update my thoughts today, forecasting would still be front and center.
The mathematics behind the methods developed in FPP is modest. Instead, the focus is on examples using functions from forecast and data from fpp in comprehensive illustrations. This approach is not unlike that of many books on R and one I believe works well for teaching applied methods.
The point of FPP departure is classical linear regression, which in turn progresses to time series decomposition, exponential smoothing, and autoregressive moving average (ARIMA models. An advanced section introduces dynamic regression, neural networks, vector autoregressions and grouped/hierarchical models. The book is not over-enamored with the more mathematically-sophisticated methods, adopting a pragmatic “simplest that works” philosophy.
As a grad student many years ago, I had the good fortune of taking a course on time series modeling with ARIMA co-originator, George Box. At the time, I thought that developing such models was more art than engineering. And though packages for generating ARIMA forecasts are much more sophisticated today, there remains a significant artful element to identifying the underlying models.
This is not a problem for the analytics professional when she has to identify a small number of time series. But what about when confronted with simultaneously generating scores or even hundreds of such models, a challenge I recently faced in a consulting engagement? It's just not feasible to independently “visit” each model for every forecasting period.
Hyndman agrees, noting that most users are not expert at fitting time series models and cannot beat the best automatic algorithms, especially when many businesses and industries need thousands of forecasts every week/month.
That's where forecast's automatic forecasting procedures for exponential smoothing (ets) and ARIMA (auto.arima) come to play. Both functions automate the model-building process, searching across a wide swath of parameters, ultimately returning “best” models according to either AIC, AICc or BIC optimization criteria. Kind of takes statistical forecasting in a machine learning direction.
My experience is that while I might be able to produce better one-offs, I cannot, overall, be more effort and accuracy efficient than the automated ets and arima models.
I like Hyndman's take on forecasting and his R forecast package a lot. I'd recommend that analytics professionals seeking a trusty forecasting toolkit take a close look.