My company, OpenBI, is a consulting partner of hot visualization vendor Tableau. We’re quite excited about the latest product release for many reasons, perhaps the most significant being the long-awaited integration with the R statistical platform. In the Fall, I had the opportunity to participate in the 8.1 Beta Program that provided access to Tableau product management/support for my often-dumb questions about the new functionality.
The Tableau-R integration allows developers to create workbook variables using the R language and statistical procedures. An added benefit is that the R computations generally behave as the analyst “wants” when filtering or performing slice and dice. I’ve successfully used simple R code to drive my split-apply-combine stock portfolio return normalization and percent change calculations. The uninitiated can learn a lot from the highly-informative R and Tableau: Data Science at the Speed of Thought webinar by Bora Beran.
Integration with R now appears to be a sine qua non strategy for analytics tool vendors. I’m currently investigating KNIME, an open source “user-friendly graphical workbench for the cradle-to-grave analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting.” KNIME is architected as a visual workflow metaphor and has much the look of a data integration tool, with drag-and-drop node folders such as IO, Database, Data Manipulation, Mining, Reporting, Statistics, etc. An R node is easily added.
New to the software, I find myself turning to R for many of the IO, DM and Statistics tasks I’m sure could be accomplished with core KNIME functionality. With the R plug-in, KNIME can powerfully combine text files, KNIME tables and R coding in its workflow. For my KNIME stock portfolio illustration, I use the R data.table to execute “split/apply/combine” logic and create a second, “pivoted”, data set available for subsequent workflow steps. Alas, I’m reluctant to “share” my flows, fearing the cognoscente would question my sanity for doing in R what could be done directly in KNIME.
Two years ago, I wrote on the integration of R with visualization tool Omniscope by Visokio. The Omniscope architecture consists of two major components: “DataExplorer, which provides data discovery & analysis, reporting and dashboarding, and DataManager, which offers tools to build and manage data sets. DataManager is essentially a poor man’s ETL tool, providing a drag and drop visual workflow to drive data extraction, merge, transformation and delivery on a small scale.“
R plugs in seamlessly to DataManager, allowing the analyst to access its full complement of language and statistical features. “As an illustration, I load a comma-delimited file and then link the contents to an R script that pivots the text data using functions from the R reshape package. After a few more R programming statements, the “munged” data is returned to DataExplorer for discovery. I’ve tried similarly reshaping/filtering/enhancing input data from several other NBER files, all with positive results My next test was to link several R Operations tasks in succession: the first to read an already-input stacked time series data file and create new variables; the second to reshape and restrict the resulting stream; and the third to invoke the Holt Winters forecasting function to “predict” the next 30 days of measurements for each of the selected series. This, too, worked well. “
And 3.5 years ago, I lauded the integration of R with the Spotfire visualization platform for many of same reasons. “The ability to execute R functions in the background and seamlessly move data between R and Spotfire is key. In fact, one can do all data manipulation for Spotfire in R scripts, then simply push the results into Spotfire with the interface. As the models got more complicated with additional factors, I combined the predictions with the base data using Spotfire's trellis graphs, the powerful dimensional visualization metaphor pioneered in S/R. At the end of the week I was convinced: I had my statistical cake and was eating my interactive visualization too.”
I’m not the typical Tableau, KNIME, Omniscope or Spotfire developer. I’m pretty lazy: Rather than spend time assimilating a new visual product language syntax that won’t scale for me, I’d just as soon code in something I know and can use in multiple instances. Let others do the dirty work.
Ironically, where R is integrated with other analytics tools, I use it as much for its base language as I do for statistical procedures. Be it for munging, wrangling, reshaping or other data science tasks, R offers a powerful and expressive vector syntax that can be applied in many contexts. Difficult to learn and quirky, yes but a welcome extension to emerging analytics platforms.