With an enthusiastic world-wide user base estimated at 2M and a stranglehold on academia, there's no doubt that the future for open source R Project for Statistical Computing is bright. Yet even with this largesse the challenges for commercial R purveyor Revolution Analytics (formerly REvolution Computing) are significant.

In the current data deluge era, the R limitation that a data set being analyzed must reside entirely in computer memory is seen as draconian. And R's interpretive language can be quite inefficient for “big” algorithms and computations. Indeed, the knock on R is that it's good for prototyping but doesn't scale. Competitors like SAS have a head start in scaling analytics, collaborating with hardware and database vendors to optimize for “big data” processing.

Though R's a fantastic language, it has a steep learning curve and limited GUI tools for the uninitiated. It took me 12 months working with R's moribund step brother S-Plus to develop a comfort level with the new array and object-oriented programming paradigm. I'm not sure typical business analysts would be so patient.

Over the 8 years I've been using R, I've found the freely-available open source versions remarkably stable, the few problems I've encountered quickly resolved by R's enthusiastic, if acerbic, support forums. For my own usage, I see no reason to buy commercial support for vanilla R. If I'm to “purchase” a subscription for commercial R, the product will have to provide me capabilities I cannot get for “free”.

The business model is different today for now-established commercial open source BI vendors like Pentaho and Jaspersoft then it was five years ago. At that time, their primary offerings revolved on product support and validation, the commercial products very similar to the freely-available community editions. Now, the vendors deploy a “freemium” strategy that markets a highly-differentiated “enterprise edition”, with critical features available for paid subscription only. More, the innovation trajectory for enterprise is increasingly diverging from community. Commercial open source with these vendors now seems increasingly proprietary.

Unlike many current COS vendors who in fact “own” the key developers, R's core development team and hundreds of package creators will likely never be on the payroll of Revolution Analytics. Trying to manage such a community to maintain a cohesive product seems a challenge the magnitude of herding cats.

Given the current physical and usability shortcomings of R that limit attractiveness to business, the head start of competitors for big data, and R's “pure” open source development model that keeps most current users happy, how will Revolution Analytics make its mark, providing value-add to customers? I asked Norman Nie what strategies Revolution will introduce to the market to build a profitable commercial open source business.

Statistics and analytics are now front and center for business. SAS remains the 1000 pound gorilla, while IBM purchased SPSS, making it a center piece of  the “Smarter Planet” strategy. And analytics are central to BI software leaders Oracle and SAP as well. How will this attention help Revolution Analytics? Will a rising tide lift all boats? How will Revolution and R differentiate?

I believe it’s today’s data deluge that’s lifting all boats. That, combined with an explosion of computing power and the growth of modern database query languages, has changed the game for predictive analytics. 40-year-old legacy products cannot keep up.

That’s where Revolution comes in. First and foremost, we will differentiate ourselves enormously from our competitors on pricing models. Revolution R will be a superior product at a much lower price, and there is a definite market for that. Much of SAS’s revenue comes from recurring renewal fees – not new installations. There are a lot of customers locked in with no other options right now.

Additionally, R has a much more programmable, extensible software architecture that makes it very difficult, if not impossible, for 40-year-old legacy products to compete with. We believe that by adding features like a rich GUI and improving scalability to handle much larger data sets, Revolution can extend R to a broader mainstream market and accelerate its adoption in business.

Lastly, SAS may be the 1,000-lb. gorilla in this market, but we also have the ability to be equally big through the massive community of R developers. Revolution’s strategy is to work with this community, and with each partnership we forge, we gain an advantage over the major market players today.

It's estimated there are over 2M users of R world-wide. Many, like myself, were SAS and S-Plus users who've migrated to R, and have now been aficionados for 10 years or more. During that time, we've found the free, community edition releases of R incredibly productive and stable. What will Revolution add to this R success story to make the product even more attractive commercially?

There’s no doubt that the existing users have found R to be incredibly productive and stable. Many have also been trained on R, and have made enormous contributions to the evolution of the program.

We believe R is the future of predictive analytics, and can be equally productive and valuable to a broader audience of users across many industries. To that end, we want to further enhance R to make it more accessible to business users. By offering a commercial distribution, we also provide a ‘safety net’ in terms of validity, support and a road map – things that are important to IT decision makers. Lastly, we will roll out on May 6 a beta community site for R users to connect and share information. R resources are broadly distributed today, and we think there’s value in pooling these resources together for both existing and new users.

SAS and SPSS would argue that R doesn't scale, cannot handle terabyte-sized problems, performs slowly with large data sets and is only suitable for programmers. Do you agree? If so, how will Revolution help?

R wasn’t built with enterprise use in mind. The fact that it is making its way into business is a huge credit to the power of R. Rather than focus on where R falls short, we believe there’s a real opportunity to make R even more powerful, and accelerate its adoption in business. That business solution is our Revolution R Enterprise.

Over the course of this year, we will enhance our Revolution R Enterprise with capabilities that will make it easier to learn and use as well as better scale to handle huge data sets.

First, we will add what we call “Big Data Analysis” for terabyte class file structures, combining the use of external memory algorithms, distributed parallel computing, high performance data access and an extensible framework for processing huge datasets in R. This will let users create advanced predictive models on the biggest data sets that businesses have today, making use of the combined computing resources of multiple machines in a cluster or in the cloud.

Second, we will build a rich, comprehensive, customizable data analysis GUI that can be easily used at all levels of expertise including programmers, non-programmers, Ph.D. statisticians and less trained data analysts. Users will be able to transition back and forth between R code and dialogs, and show only as much R code as they want to see.

Give me a one minute elevator pitch, SAS vs Revolution R; the same now for SPSS vs Revolution R.

Revolution R is cheaper, faster, and plain better than the alternatives. It's the way of the future and has a much higher ability to scale with a massive development community--one that easily dwarfs even the SAS engineering army in size and scope--backing it up. It will be as easy to use as SPSS while maintaining superior programming flexibility than solutions from either company.

Give us a strategic sense of how Revolution Analytics will use its considerable world-wide clout to take on SAS, SPSS and other analytic competitors.

We are going to be a broad, full-line analytics provider. To get there, we are going to engage deeply with the R open source community; continue supporting the training and use of R in academia by giving away Revolution R Enterprise for free to academics at accredited institutions; partner with code packagers to make their offerings commercial-ready; and drive product development and innovations needed to make R more accessible to business users.

I'm now doing some work with TIBCO, demonstrating the integration of R with top visualization tool Spotfire – open source with proprietary. Do you foresee much additional integration such as this, where R is an analytics server, front-ended by other, perhaps proprietary tools?

Definitely. Whenever the results of a statistics or predictive analytics computation are presented to a decision maker – whether in the form of a table, a chart, or a complete report – we believe that R is the best environment for developing those calculations. To that end, we are in the process of extending Revolution R to include a web-based analytics server, so that it can act as the back-end “analytics brain” for applications like BI tools, proprietary or OSS. In fact, we see this as a key aspect of our future growth, by facilitating the process of delivering the expertise of R users to the broader community of business users.  

Get out your analytics crystal ball. What will the commercial analytics market place look like in 5 years? 10 years? 25 years?

It's nearly impossible to provide a 25-year forecast with the current pace of technological innovation. However, I see four major trends taking shape that will almost certainly play significant roles in the continuing evolution of the commercial analytics market:

  1. The ubiquitous use of data to make business decisions.
  2. The size and scale of data will continue to grow at unprecedented rates
  3. Predictive analytics will be increasingly full time and embedded and automated into the business cycle. Customer complaints won't be stacks of papers on the complaint department desk. Instead, they will appear in the form of automated feedback that goes back into the system and is acted on with great speed, which tools like ours make possible with real-time automation.
  4. The last 40 years has been the revolution of organized data that is numerical in origin and in rows. We are now starting to see an increased emphasis placed on linguistic data, which is being mined and turned into indicators that will be a regular part of our added ability to understand, motivate and incentivize customers and enterprises alike to more efficiently run businesses. Some of these tools are beginning to emerge in crude forms now, but I believe they will explode over the next decade.

Steve also blogs at Miller.OpenBI.com.