The Vegetarian Enjoys Steak – Open Source R Meets Proprietary Spotfire
My nephew and niece-in-law are committed vegetarians. Their diet, though, is generally not very visible to me except when we go out to dinner or the extended family gathers in the Outer Banks for vacation. I feel a bit guilty watching them prepare their veggies as I gorge on steamed crabs and grilled tuna. Twice a year, however, they go off the vegetarian wagon by design, enjoying grilled pork chops or big hamburgers for special treats. As I think about my current analytics situation, there might be something for me to learn from their lead.
A devotee of open source software, I sometimes feel like a BI vegetarian. OpenBI works with the Pentaho and Jaspersoft BI platforms, the R Project for Statistical Computing, open source relational databases MySQL, PostgreSQL and Infobright, and distributed computing platform Hadoop/MapReduce. So far, with the exception of databases, we've pretty much been OSBI purists. Indeed, though not the case with us, with many in the OS space, a complete open source software stack is the only choice. There doesn't seem to be a middle collaborative ground for OS and proprietary. You're either a BI vegetarian or not.
I'm also a big fan of graphics and visualization for BI. For me, one of the main attractions of the open source R Project for Statistical Computing in contrast to its proprietary statistical competitors is the superior graphics and the visual mentality of the R platform. The graphics capabilities introduced in S/R have established a standard for statistical software. And with ready access to the underlying building blocks, R developers routinely implement new and powerful visuals – and at times, as is the case with ggplot, entirely new graphical grammars. As powerful and varied as R's graphics capabilities are, however, the platform, like SAS, SPSS and the other commercial statistical software, is generally lacking in live interactive visualization. And the open source packages I've worked with, GGobi and Mondrian, while easy to use and friendly with R, are not as functional as their commercial counterparts.
Five years ago – before OpenBI -- I took a look at visualization software available at the time and really liked three of the products I examined: Tableau, Advizor Solutions and Spotfire. I hadn't kept up with their evolution until I saw a promotion a few months back for the new R/S+ integration with Spotfire. That piqued my interest, so I worked out a demo agreement with Tibco and set off to look at the latest version of Spotfire with its new R interface.
Spotfire was just as intuitive and even more powerful than I remembered. Getting started with the R integration, though, wasn't without challenge. While the basic setup appeared to be fine, at first I couldn't get the R connectivity working. It turns out Spotfire assumed the R packages needed to support the integration would reside in the same library as core packages, when in fact an R install promotes a more flexible library setup by default. Once I got past that hurdle, I was able to build models in R with a pretty large (> 500,000 record) data set, combining the prediction curves with the base data in visualizations that merged the disparate data sources. The ability to execute R functions in the background and seamlessly move data between R and Spotfire is key. In fact, one can do all data manipulation for Spotfire in R scripts, then simply push the results into Spotfire with the interface. As the models got more complicated with additional factors, I combined the predictions with the base data using Spotfire's trellis graphs, the powerful dimensional visualization metaphor pioneered in S/R. At the end of the week I was convinced: I had my statistical cake and was eating my interactive visualization too. I could see Spotfire and R, along with PostgreSQL, Ruby and Pentaho Data Integration, as bedrocks of a modern analytics tool chest.
A quick follow-up investigation of the predictive analytics capabilities of the three well-regarded platforms suggests that the Spotfire/R tandem is well out in front. Tableau has a few hard-wired trending/regression features. Advizor counters with a more comprehensive polynomial regression framework from KXEN. Core Spotfire provides a number of standard statistical capabilities, including correlation, regression and clustering. But it's access to R that distinguishes Spotfire in predictive analytics. Rather than having to learn the vagaries of different models for each platform, one simply uses her favorite routines from the wealth of available R packages, combining the output with the rich Spotfire interactive graphics. If she'd rather not use standard regression, she can instead pick one of the many other R predictive model possibilities like GAM, MARS and Random Forests for statistical tasks.
My vegetarian relatives are generally shocked at the high prices for non-vegie meals when they join family gatherings at favorite restaurants. As an open source vegetarian, I share their angst at the cost of commercial analytics software dining, especially in contrast to open source vegetable value meals. The Spotfire website quotes an annual subscription price of $4,800 for single user, web-based Spotfire, not including R integration. I can only imagine what the purchase price for a 25-user license, R included, would be.
On the other hand, Tibco/Spotfire has done a masterful job both positioning the software and demonstrating its value as an enterprise analytics competitor. And certainly many companies see the benefit of absorbing the big price tag for Spotfire, just as many hungry carnivores readily drop $75 for an aged double cut filet minion. For now, I ask for just a little slack as I temporarily slip off the vegetarian, open source wagon to indulge in some delicious Spotfire/R meat!
Steve also blogs at Miller.OpenBI.com.