JUN 1, 2007 1:00am ET

Related Links

Visiting Nurse Service Cares About Cloud Security
October 25, 2011
Light at the End of the Silo
October 28, 2010
Pitney Bowes Releases Enhancements to MapInfo Professional
September 13, 2010

Web Seminars

How to Narrow the IT/Business Communication Gap
March 21, 2012
Enhance and Expand BI with Mobile
Available On Demand
Bullet Proofing Big Data Analytics Infrastructure for Critical Deployments
Available On Demand

Product Review: S-PLUS 8 from Insightful

Print
Reprints
Email

I was intrigued when invited to write a review of the statistical program S-PLUS 8 from Insightful Corporation. I had used S-PLUS 5 and 6 in the late 90s and early 2000s - liking the product a lot - but shifted to S-PLUS's close open source kin R about four years ago. My feelings for R and its community are well documented, so I wasn't sure how receptive I'd be to returning to S-PLUS after experiencing the power of a true open source product and community. I remember vividly my frustrations of several years ago attempting to migrate R code to S-PLUS, and vice versa, even as both were descendants of the same award-winning code base from the late Bell Labs.

Ignoring my fears, I jumped in head first, attempting to scale the learning curve of S-PLUS 8 by porting two extensive R scripts that involved connectivity to the Web, database access, programming, statistical models and graphics. To my surprise, I was actually able to successfully convert the scripts in a reasonable amount of time. The effort was by no means seamless: there were incompatible functions unique to each dialect, parameters to functions unique to each, and even different behaviors in common functions. All told, however, the conversion was much smoother than I remembered - even with the more complicated scripts. As it turns out, this initial success was a harbinger of what's new in S-PLUS 8.

The new S-PLUS 8 Enterprise edition comes with a more extensive Eclipse-based IDE and debugger, and expands the critical-for-BI capabilities of the big data library for handling data sets larger than memory. There are over a hundred new functions in S-PLUS 8, many of which provide compatibility with R. Frustrated R programmers will now find the convenient "with" function in S-PLUS 8, for example. A major enhancement to S-PLUS 8 is the new package system, a powerful framework for developing and distributing S-PLUS code libraries. This will be a boon for S-PLUS analysts, promoting access to many of the latest statistical procedures developed in the academic and research worlds.

Of the many attractive developments in S-PLUS-8, none is more significant than the movement to align the disparate S-PLUS and R dialects of the S language. Because both languages are in motion independently, it will be no easy feat to unite them - think of Sybase and Microsoft SQL Server code bases that forked many years ago. But at the same time, there appears to be a commitment from both the R community and Insightful to collaborate for the common statistical good. Indeed, S-PLUS 8's new package support accommodates both S-PLUS and R language dialects, so now the S-PLUS community can extend the language much like R. And S-PLUS 8 provides tools to convert R packages to work within the S-PLUS environment. This is a big lift for S-PLUS, since R is lingua franca of academic statistical computing, with many of the latest methods making their debut as R packages available for distribution. I am currently testing Rpart and RandomForest, two R regression tree packages extremely popular with R users, and just recently migrated to S-PLUS 8.

While the evolving collaboration between Insightful and the R community offers the S world major benefits of both the commercial and open source (OS) software models, it is at same time fraught with challenges and pitfalls. The commercial and OS models of software development are very different and seemingly incompatible. The currency of volunteer open source is recognition and status within the community in contrast to monetary compensation with commercial peers. And of course, commercial and open source licensing models vary 180 degrees on cost, usage rights and source code access. Ask an open source community about commercial, proprietary software and they cringe over exorbitant licensing costs, substandard product quality, limited innovation, and lack of access to source code. Ask a proprietary vendor about open source, on the other hand, and they lament the absence of leadership, strategic focus, business discipline and product support of OS initiatives, citing difficulties of "herding cats" in the OS community - all to the added risk of product adopters. These conflicting perspectives can create a good deal of distrust between the communities, especially if the communities are competing with the "same" product.

Despite the significant challenges, the potential benefits of aligned Insightful and R communities are substantial. Done correctly, the commercial open source (COS) model that could well be the outcome of this collaboration can combine the accelerated innovation and enhanced product quality of open source with the strategic focus, support and indemnification of the commercial model - all the while cutting costs for consumers.1

There are examples of successful COS software companies to guide Insightful/R. Red Hat is the prototype for COS success with infrastructure, and MySQL is preparing for an IPO to hasten its expansion in the database market, while Pentaho and JasperSoft are gaining recognition in the BI space. There are also strong natural synergies between Insightful and R. In the utopian COS statistical world of aligned S-PLUS and R, practitioners would benefit from the worldwide support of an incredibly talented user community, simultaneously deploying modules showcasing the latest statistical and quantitative techniques from the centers of research and academia. In that same utopian world, pragmatic CIOs looking to deploy analytics around their business enterprise could sleep soundly knowing their statistical investment is supported and indemnified, also confident their requests for product enhancement (such as handling large data sets) will be heard and hit the development priority queue.

A new COS-based statistical software company promoting the best of open source R and commercial S-PLUS would not go lacking for proposed product enhancements to serve an increasingly demanding BI community. The original S language developed more than 20 years ago limits data set sizes to something less than available RAM. For most traditional statistical applications, a 1 gig data frame limit is more than adequate. For data mining and business intelligence applications of 2007, however, that limitation is quite restrictive, often forcing analysts to artificially restructure their work. Insightful started to redress the problem with S-PLUS 7, providing a "big data" library of functions to work with large data sets. Though big data provides substantial relief from size limitations by using virtual memory, the solution is far from seamless because users must invoke special big data functions to get the benefit. Alas, big data and not-so-big data don't always interoperate well. A "larger" solution would certainly involve re-architecture of the core platform - an impossible task without the strong collaboration from both the R and S-PLUS camps.

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.