Ask just about any data scientist or science of business BI specialist about critical components of her job, and she’s almost certain to mention data integration (munging), statistical analysis and visualization at the top of her list. Once the data are in place, the attention often turns from integration to the powerful combination of statistical analysis and visualization for discovery/exploration.
I can’t imagine there’s a bigger advocate of the R platform for analytics and data programming than me. I love R’s core statistical graphics and add-on lattice and ggplot statistical packages as well, using them all the time in my work. But I’m less enthused about R’s interactive visualization capabilities. My take is that commercially-available software trumps its open source competition at this point. Frustratingly, though, exploratory data analysis nirvana for me is a robust visualization platform that interoperates with R.
Over the past few months I’ve been evaluating visualization software from Tableau, Spotfire, Qlikview, Advizor Solutions and Visokio. There’s plenty to like about each of the vendors’ products. In fact, I wouldn’t discourage potential customers from any of those choices.
The new demo release of Omniscope from UK vendor Visokio especially caught my eye. An email I received from the product manager mentioned new R integration and provided links to several recorded demos explaining how it works. Though excited, I made a mental concession that I’d be happy if the capability simply allowed me to access native R data in Omniscope – essentially, an ODBC or JDBC for R. At the same time, after having evaluated what appears to be similar interoperability between Spotfire and R a year and a half ago, I was interested in how the Omniscope/R connectivity compared. I’m glad I invested the time to find out.
Omniscope is somewhat distinctive among its peers. The software has two major product components: DataExplorer, which provides data discovery & analysis, reporting and dashboarding, and DataManager, which offers tools to build and manage data sets. DataManager is essentially a poor man’s ETL tool, providing a drag and drop visual workflow to drive data extraction, merge, transformation and delivery on a small scale. When I first started looking at Omniscope, I pooh-poohed DataManager, thinking I’d never need it. After all, what can’t I do with Pentaho Data Integration and Ruby? I later started to appreciate DM quite a bit.
Once Omniscope determines the installed R directory structure, programmers can write R code in DataManager for either data access (Source) or data manipulation (Operations). I first took on the simpler Source, learning how to use the editor to develop a script that loads a 5.4M record R data frame into memory, ultimately returning a random sample of 100,000 records and predictions from a cubic splines regression model to DataExplorer for exploration. Flush with that success, I then coded the beginnings of several generic tools that drive from R meta-data and dynamic statement building. I was getting quite comfortable using R as a DataManager Source.
Operations extends the programming power of R to other data in a DataManager stream. As an illustration, I load a comma-delimited file and then link the contents to an R script that pivots the text data using functions from the R reshape package. After a few more R programming statements, the “munged” data is returned to DataExplorer for discovery. I’ve tried similarly reshaping/filtering/enhancing input data from several other NBER files, all with positive results. While I’m sure I could do the pivoting tasks using other DataManager Operations functions, I was able to accomplish everything I needed to do with the R language I know.
My next test was to link several R Operations tasks in succession: the first to read an already-input stacked time series data file and create new variables; the second to reshape and restrict the resulting stream; and the third to invoke the Holt Winters forecasting function to “predict” the next 30 days of measurements for each of the selected series. This, too, worked well. The forecasts behaved as expected and I was able to examine the results visually in DataExplorer. Cool stuff.
Though there are a few gotchas like missing data surprises, based on my tests to date, count me as an enthusiastic supporter of the interoperability of R and Omniscope. The combination indeed makes for a powerful data science exploration platform. And for Agile BI at its finest.
Kudos to Visokio for a job well done. Me? I’m ecstatic. I can now have my interactive visualization cake and eat my R for statistical analysis, too!