“A little prediction goes a long way” wrote Eric Siegel in his popular Predictive Analytics book. True, predictive analytics is now part and parcel of most Business Intelligence (BI), analytics and Big Data platforms and applications. Forrester Research anecdotal evidence finds that open source R is by far the most ubiquitous predictive analytics platform. Independent findings and surveys like the ones by KDNuggets and RexerAnalytics confirm our conclusions (and I quote) “The proportion of data miners using R is rapidly growing, and since 2010, R has been the most-used data mining tool. While R is frequently used along with other tools, an increasing number of data miners also select R as their primary tool.”
To jump on this R feeding frenzy most leading BI vendors claim that they “integrate with R”, but what does that claim really mean? Our take on this not all BI/R integration is created equal. When evaluating BI platforms for R integration, Forrester recommends considering the following integration capabilities:
- Point and click GUI to create R scripts. Since R is a scripting language, does the BI vendor provide point-and-click GUI to generate R code? Martha Bennett suggested also asking about the capability not to just auto-generate R scripts, but also to edit, update and maintain them. Beyond just generating code, there's the whole business of model design, test, management, and execution, which is usually the realm of advanced analytics platforms. How much of these capabilities does the BI vendor provide?
- Passing variables from BI to R. Can R routines leverage and take advantage of all of the BI metadata (data structures, definitions, etc.) without having to redefine it again just for R? A related question: does the BI vendor provide an open source utility to generate R script headers to facilitate passing variables and parameters to/from BI and R?
- Passing variables from R to BI. Can the output from R calculations (scores, rankings) be passed back to BI platform and embedded in the BI reports and dashboards? A related question: does the BI vendor provide an open source utility to parse R script and identify variables and parameters to pass to/from BI and R?
- Intercepting error and other messages from R. Can the BI platform intercept, interpret and act on error messages and alerts from R models/scripts?
- PMML import/export. Can the BI vendor import/export R models based on PMML?
- Native execution. What server are R models executed in? R server? BI server? Database server? Natively in a Hadoop server?
- Content management. Does the BI vendor treat R scripts as part of its metadata and content, and therefore include R scripts in its content management processes such as versioning, migration (from development to test to prod environments, etc)?
- Scalability (compliments to Mike Gualtieri). R is not designed or optimized to to process VERY large data sets. Does the BI vendor add any capabilities to parallelise R processing natively in Hadoop or in DBMS?
Do these sound just about right? Are we missing anything? All comments are welcome.
This blog originally appeared at Forrester Research. Published with permission.