Open Thoughts on Analytics
for Information Management Blogs
MAY 1, 2012 9:10am ET

Blogroll

blog

SAS vs. R: Statistical Modeling Rivalry Renewed

Print
Reprints
Email

I met up with my old analytics friend a few weeks back. Throughout the ‘90s, we shared statistical war stories from our respective SAS consulting customers. Around 2000, I made the software leap to first S+ and then R, while he remained loyal to SAS. Since that time, we’ve stayed close, but there’s generally a bit of a SAS-R edge to our statistical discussions.

My friend and I probably see each other’s statistical product choice as a series of stereotypes. For him, R’s open source development model presents serious quality risks in contrast to the tried-and-true proprietary SAS QA methods. Compared to SAS’s serious enterprise business focus that includes data integration and BI, he argues that R’s simply for academics and researchers. And, of course, R’s in-memory processing limitation consigns it to toy data only.

I, on the other hand, paint SAS as the slow-moving gorilla, lagging behind R in statistical innovation, driven by a tired 1980s language portfolio of data step, procs and macros that pales in comparison to R’s modern array and object orientation. And I note with joy that R now wears the crown of preferred platform for statistics graduate students that SAS wore in the 80s and 90s.

Nursing Hefeweizens, my friend and I took turns sharing our latest statistical challenges. He noted that his current work revolves as much on graphics and visualization as it does on statistical models per se. I countered that I’ve been heavy into time series and forecasting over the last nine months.

After a few pints, I just had to tweak my friend on SAS’s inferior graphics, noting that while visualization is central to the R analytical approach, it often seems an afterthought in SAS. And with good reason, I argued, for SAS graphics are weak. Back in my SAS days, I often dropped into companion product JMP for visualization. Now even SAS devotees often prefer R’s superior graphics.

A bit annoyed, my friend asked last time I’d seen SAS graphics. I was embarrassed to acknowledge it’d been 12 years. He challenged me to review the latest offering before continuing to pass judgment. Touche.

As I started to discuss my forecasting work, my friend asked what R packages I was using. I responded that I’m working with the forecast library written by Rob Hyndman that I absolutely love. Among its many features, forecast estimates exponential smoothing models as well as the autoregressive, integrated moving averages (ARIMA) popularized by Box and Jenkins. What sets forecast apart, I noted, is its ability to automate the model selection process based on established optimization criteria. Since I’m often estimating scores or even hundreds of models simultaneously, this feature is particularly handy.

Once I’d finished discussing my work, my friend jokingly asked about the likely now-defunct economics grad student who’d authored forecast, and how I could have any confidence in the model estimates the package provided. Would the author fix bugs found by the community and support the library in the long-run, he mused? And would I bet a week’s compensation on forecast’s quality?

As we wrapped up our enjoyable banter, I think both of us realized we’d built straw men to support our statistical platform arguments. Indeed, as I later thought about the discussion, I realized just how out of touch I was with SAS’s current capabilities, and how in the dark he was in general about R.

At OpenBI, we constantly remind ourselves not to anchor competitors in the past. What a product or firm looked like in 2007 could well be very different than it is in 2012. Losers can become winners and winners can become losers very quickly in technology. It’s critical, therefore, to stay up on the market.

So I set out to take a look at some of the latest graphics capabilities of SAS. I wish I could have gotten a demo version of the software to test-drive, but had to settle instead for a review of the samples output gallery. I was more impressed than I expected to be, even if the sight of macro code behind the visuals gave me heartburn. I think I could do in SAS/GRAPH much of what I now accomplish with the excellent R lattice and ggplot packages. I was especially intrigued by the newest Statistical Graphics, with the sgpanel proc for conditioned or trellis visuals. Maybe SAS graphics is not the beastly laggard I’ve characterized.

I received an email from my SAS friend a few days ago. Just as I’d taken a look at current SAS graphics, he’d started investigating time series and forecasting for future work. Low and behold, he came across an excellent book on the latest state space methods written by academic Rob Hyndman and others. Hmm, Rob Hyndman – isn’t he the same guy who authored the R forecast package? Perhaps the economics grad student isn’t defunct after all.

Advertisement

Comments (5)
Hello Steve -

I ordinarily avoid posting comments that are focused on product capabilities, but since your post is specifically about SAS... I agree that you should definitely take a new look at SAS. Just a few of the things you will find:

- In addition to the graphic samples that you link to, we now have SAS Visual Analytics, a high-performance, in-memory solution for working with massive amounts of data very quickly. SAS Visual Analytics provides a very compelling visual representation that includes autocharting, geographic visualizations, etc., that is capable of visually representing billions of rows of data in seconds - http://www.sas.com/technologies/bi/visual-analytics.html. This offering includes visualization support for mobile devices like the ipad.

- SAS Visual Analytics is part of a broader high performance analytics offering that SAS includes grid technology, in-DB processing and in-memory processing. SAS has offered these capabilities for many years and has constantly expanded the functionality to include additional in-memory capabilities. The latest is the SAS LASR Server, which is an in-memory analytics engine that forms the foundation for the SAS Visual Analytics. In addition to this solution that provides high concurrency support for many users doing rich visualization work, SAS offers high performance analytics capabilities that allow organizations to solve complex problems using big data and sophisticated analytics. This is done via a distributed, in-memory architecture that is not limited by physical memory limits and provides multiple deployment patterns - from MPP database implementations on EMC Greenplum and Teradata, as well as Hadoop-based implementations. For more information on our high performance analytics capabilities, go here: http://www.sas.com/software/high-performance-analytics/index.html

- SAS Information Management provides a complete set of capabilities to help oversee the entire data to decision lifecycle. As you mentioned, this includes extensive data preparation capabilities, but also includes the ability to manage a large number of complex analytical models PLUS the ability to integrate the analytical results directly in operational systems. As part of the SAS Information Management, we have support for Hadoop - this includes the ability to work with Hadoop data in a SAS environment using the SAS/ACCESS module for Hadoop, that is based on Hive. It also includes the ability to interact with Hadoop processing via HDFS commands, MapReduce code and Pig, which can be authored graphically in the SAS Data Integration Studio environment. It is possible to build a job flow that intermingles SAS processing and Hadoop code all in the same job, making it easy to leverage the best execution environment for the task at hand. In addition, just as SAS has moved to run processing in and alongside the MPP databases, SAS will soon provide the ability to run an embedded SAS process directly on a Hadoop node. For more information on our Hadoop support, please read here: http://blogs.sas.com/content/datamanagement/2012/03/06/sas-hadoop-a-peek-at-the-technology/

I would be happy to provide you with a more extensive briefing on these and other SAS capabilities.

Thanks, Mark Troester IT/CIO Thought Leader & Strategist mark.troester@sas.com Twitter: @mtroester Blog: http://blogs.sas.com/content/datamanagement/

Posted by Mark T | Tuesday, May 01 2012 at 12:11PM ET
For the statistical user, there are two kinds statistical graphics in SAS. (1) Graphics that are created AUTOMATICALLY by SAS procedures as part of the analysis. For an overview, see http://support.sas.com/resources/papers/76822_ODSGraph2011.pdf

(2) Graphics that are created manually from data sets. These include the SGPLOT and SGPANEL procedures. There is a statistical graphics blog at http://blogs.sas.com/content/graphicallyspeaking/

Posted by Rick W | Wednesday, May 02 2012 at 9:28AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for Steve Miller

Lean Start-Ups, Planning and Searching
Tableau, Python and R
The Data and Bias of Macroeconomics
No Quick Death for Statistical Practices
Getting Started with Statistical Learning

More from Steve Miller »

Blog Index »

Where do young IT professionals (30 and under) obtain information to aid with daily role responsibilities and career development?

Trade publication websites 14%
Social media 23%
Vendor websites 4%
Vendor/community forums 7%
Newsletters 1%
Trade conferences/meetups 2%
RSS feeds 6%
Web search 44%

 

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.