The statistical analytics market is heating up. Evidence-based management, competing on analytics and the science of business are among the main drivers of a surge in demand for predictive models. There are at least two dozen credible products in the space currently, each with different emphases and strengths. Some products revolve on languages, while others have strong GUI's and APIs. Traditional proprietary offerings compete with new open source entrants. Several platforms focus on mainstream statistics while others obsess on newer data mining and machine learning techniques. A few offer the powerful combination of data management, programming, reporting, statistical and graphics capabilities.
Even with a noisy market, though, I think it's safe to say that SAS, SPSS and R are currently in statistical package leadership positions. SAS is the big gorilla, the largest of the three by far. Generations of statisticians have built careers on SAS solutions for statistical, data management and reporting needs. SAS is the Howard Cosell of the statistical world: everybody has a strong opinion – perhaps as many negative as positive.
SPSS is even older than SAS and enjoys a faithful following among social scientists, marketers and business strategists. The vendor embraced Windows early on with an intuitive interface that takes much of the tedium out of statistical analysis, making SPSS the choice for non-quants. It certainly doesn't hurt that tech behemoth IBM recently acquired SPSS to help drive its vanguard Smarter Planet initiatives. And open source R is now lingua franca of statistical computing at top universities around the world, setting the standard for statistical innovation with an estimated user base of 2M. No doubt R will make future waves in business as students gravitate to careers in industry, government and academy.
In the next few years, statistical analysis will become a staple of business as packages interoperate with BI platforms. Statistical data will increasingly be sourced from the information factory, its models extending the dashboards and OLAP cubes of traditional business intelligence. Statisticians will thus be forced to become less insular. Rather than off-in-the-corner with their arcane models, they will be part of the larger BI team, possessing database, ETL, reporting and OLAP skills in addition to knowledge of statistical platforms.
As well, BI analysts will need to become statistically multilingual, capable of producing their data, models and graphics in multiple packages. Those familiar with SAS and SPSS will learn freely-available R. Those weaned on R in graduate school will learn SAS and SPSS as they support legacy applications in the work world. Acknowledging the ascendancy of R in academia, both SAS and SPSS have recently facilitated transparent data sharing with the platform. R packages promote access to SPSS and SAS data files as well.
For those with a working knowledge of SAS or SPSS looking to learn R, I'd recommend a couple of books I've purchased this past year. The first is R for SAS and SPSS Users, by Robert Muenchen; the second, SAS and R, by Ken Kleinman and Nicholas Horton.
R for SAS and SPSS Users provides an excellent introduction to R. As Muenchen, Manager of the Statistical Computing Center at the University of Tennessee, notes in the Preface, the SPSS and SAS platforms, introduced over 30 years ago, have much in common – but are very different than 10 year old R. The book's first chapters focus on gentle GUI's for R before taking on the language starting in Chapter 8. At that point the book meticulously covers data management, data structures, programming, graphics and basic statistical analysis in R. The prose is clear, the examples tied to their SPSS and SAS analogs. The handling of both traditional and newer “ggplot2” graphics is comprehensive: SPSS and SAS users will undoubtedly find lots to like. The appendixes contrast R jargon with SPSS/SAS and compare SPSS/SAS products with the corresponding R packages.
If R for SAS and SPSS Users shines in data management and programming, SAS and R is comprehensive in its treatment of traditional statistical models, clearly flowing between the languages and methods with well-worked examples. Both books emphasize graphics, reflecting R's strength in the area. Indeed, SAS and SPSS users often turn to R for visuals unavailable in their platforms. As is the case with R for SAS and SPSS Users, SAS and R concludes with a healthy appendix, in this case an Introduction to R.
For SAS and SPSS programmers looking to grow with the popularity of R, or for new R graduates seeking to find their way in legacy SAS and SPSS statistical markets, R for SAS and SPSS Users and SAS and R offer quick and sure starts to becoming statistically multilingual. In the increasingly “flat” statistical world, the ability to speak in several statistical tongues will be an important analytic differentiator.
Steve also blogs at Miller.OpenBI.com.