Momentum continues for the R Project for Statistical Computing in 2011. There are now over 4,000 freely-available packages developed by the R community. R books, blogs and articles are proliferating at a mind-numbing pace.
Indeed, there are task views just to catalog available R packages, and meta R blogs just to aggregate other R blogs and educational materials. I must admit it's now a challenge to keep up with all the developments.
R appears to be making significant inroads into commerce, even as it tightens it's grip on the wordwide academic and research communities, where it's lingua franca of statistical computing. The emergence of data science, which addresses integration, statistical analysis and visualization of big data, will only heighten the use of R in business. And if the successes of the annual R Finance conferences in Chicago are any indication, the platform has established a strong presence with both the buy and sell sides of financial services.
It was against this backdrop that I revisited my articles of last May on Revolution Analytics, the company “that provides commercial software and services that support users of the open source R programming language.” RA's certainly been visible over the last year, with several releases of its enterprise platform, in addition to an initial launch of both R for “big data” and a Web services component. The company's also been busy on the marketing side, cultivating the R community with sponsorship of user groups and conferences.
Despite RA's successes, there're still open source purists opposed to R commercialization. If my experience is any guide, though, this resistance will wane over time, as the market comes to understand what commercialization adds. If R is to make it's mark with big data and Web integration, it's probably going to be a commercial vendor that gets it there. That's certainly the precedent I've seen in the open source BI world the past five years.
I caught up with Revolution CEO Norman Nie for a re-interview in late May. Nie was enthusiastic about sharing his experiences of the last year. And I continue to find his 40+ years of accumulated industry wisdom enlightening.
My questions and Nie's responses:
1) It's been over a year now since you assumed the CEO role at the company to be relaunched as Revolution Analytics. Could you give us a quick summary of the year? Are you glad you took the position? What have been significant accomplishments? Disappointments?
It was the potential I saw in R that drew me out of retirement to lead Revolution Analytics in its next phase of growth. Looking back at my first 18 months on the job, I can safely say I have no regrets about pursuing this opportunity.
Leading up to and during our relaunch in May 2010, I was fond of saying that we found ourselves amidst a “perfect storm of predictive analytics.” There was a tremendous opportunity for a player to introduce a modern data analytics solution equipped to tackle the big data challenge. Over the past year, we’ve introduced significant enhancements to Revolution R, including big data analysis capabilities and a scalable web services package.
The next major item on our roadmap is a user-friendly GUI that will bring advanced analytics capabilities to a wider audience of business analysts. While we had initially targeted early 2011 for its release, we had to push our release back to Q3 2011. That being said, we’re on track to deliver our GUI at that time, which should open the door to even greater possibilities down the road.
2) Two of the early big tasks you articulated for the company last year are:
- “Big Data Analysis” for terabyte class file structures, combining the use of external memory algorithms, distributed parallel computing, high performance data access and an extensible framework for processing huge datasets in R; and
- Build a rich, comprehensive, customizable data analysis GUI that can be easily used at all levels of expertise including programmers, non-programmers, Ph.D. statisticians and less trained data analysts.
Could you comment on the progress with each of these?
We released RevoScaleR, our big data analysis package, in August 2010. Shortly thereafter, we released RevoDeployR, which allowed us to integrate Revolution R up and down the data stack, in September 2010. Those releases – our most significant of 2010 – gave us a truly enterprise-ready product, and we’ve seen high customer growth as a result.
As for the GUI, we’re on track for a Q3 2011 release as I mentioned above. It will be our most significant release of 2011 and we’re excited for the new growth potential it will offer Revolution Analytics as a company.
3) There's much interest in R from the “data scientists” now doing analysis at companies like Google, eBay, Facebook, LinkedIn and Amazon. Has Revolution made inroads with these companies or are they still primarily using the community editions of R?
R is central to the rise of the data scientist, and it’s certainly heavily employed at the companies you list above, as well as others. This presents a great opportunity for Revolution in that organizations will encounter certain roadblocks with open source R when it comes to big data analysis and other enterprise-critical features.
One of the biggest challenges of working in our industry is that it’s difficult to find public customers. Nobody wants to share their “secret sauce” with competitors, so the vast majority of companies that we work with wish to remain anonymous.
While we can’t mention specific companies, it’s safe to say that we have had – and are having – active discussions with a number of high-profile Web 2.0 companies, some of whom are already Revolution customers.
4) Relatedly, tell us about RA's integration with Hadoop and large database appliances.
Hadoop is an extremely active development area for us. R and Hadoop are extremely complementary technologies, and we’re working to further integrate the two for enterprise use. To that end, we’ll be announcing a significant partnership involving R and Hadoop integration in the near future, so stay tuned on that front.
5) Where is RA finding it's most significant early sales successes?
Financial services is the industry vertical in which we’re currently seeing the most traction. We’ve also seen particularly strong growth in pharmaceuticals/life sciences and business intelligence.
Data science is interesting in that it has such a horizontal application – data scientists are in high demand in all of the aforementioned verticals, and they are bringing R with them to new fields as their numbers grow.
We’ve also seen strong traction with both the community edition R users and universities/research institutions (with more than a fair amount of overlap between those two camps).
Finally, we’ve seen a number of customers migrate to Revolution R from SAS. There are fewer migrations from SPSS.
6) What technological surprises might we see from RA over the next year?
I’m both confident and optimistic that our GUI will be extremely well-received by both our customer base and the larger statistics field. It’s a thin-client GUI that will streamline a number of our more technical components, allowing line of business users to do advanced analysis with Revolution R – tasks previously reserved for Ph.D.-level statisticians.
7) I wrote in one of my blogs on R and Revolution last year: “RA must simultaneously appease and cultivate the R community of developers that does most of the platform work gratis … RA must herd the proverbial cats of R's free-thinking world-wide development community, many of whom are ambivalent or worse about commercial open source.” One commenter noted dissatisfaction with RA's commercial open source model: “The returns that Revolution is gaining from doing this are not being fed back to the rights holders. They are not good corporate citizens. Please don't do business with them.” How would you respond?
Since its inception, Revolution has placed a premium on community relations. We realize that there are those out there who fundamentally disagree with our open source business model, but we’ve seen a generally positive reception from the R community.
We fully recognize that our success is contingent on the continued evolution and growth of the wider R project and aim to support it wherever possible. To that end, we’ve sponsored a number of R- and statistics-centric conferences over the past year, including UseR 2010 and O’Reilly’s Strata Conference.
We’re also active sponsors of R User Groups around the world. We’re extremely pleased at the growth we’ve seen there – which is indicative of R’s general rise in stature in the statistics community. In the fall of 2009, there were only 3 or 4 R User Groups; today, there are over 50 worldwide.