The directive for judges was to evaluate submissions on: 1) Applicability to Business, 2) Innovation, and 3) Persuasiveness. On the close date right before the holidays, there were 11 qualified entries addressing disparate business issues from marketing, manufacturing, clinical trials, sports and information technology.
I learned a good deal from each of the write-ups, but in the end found two entries in particular to be standouts. The other judges almost agreed with me: the overall contest winner was my number two, while my top choice was the contest number two.
My second choice was A Direct Marketing In-flight Forecasting System, an R-based, Web-deployed utility developed by Nationwide Insurance that purports to answer the business question, “Can a company get an early read on its direct marketing initiatives?” In more technical terms, the application is “designed to forecast the total incremental benefit of a marketing tactic when only a fraction of the marketing responses have been observed … we seek to learn about (1) the total number of marketing responses due to the intervention, and (2) the distribution of these responses over time.”
The authors share R code that simulates the total effect (benefit) of a marketing tactic and implements a response time distribution describing how this effect is rendered over time. They draw on the isotonic regression function available in base R for much of the latter work. And assuming a stable, early marketing response time distribution, they provide forecasts of “the ultimate incremental lift of the marketing tactic.” To deploy the R code to business users over the Web, the authors use the CGIwithR package readily accessible from the Comprehensive R Archival Network (CRAN).
Preliminary reaction to the initiative has been positive. In some cases, weak campaigns can be scuttled early, the resources allocated to new tests. The Web-based app is consistently available to users and, with automatic response update, marketers get latest performance information on demand. “For poorly performing tactics, such timeliness can prevent wasted marketing spend; for successful tactics, it means identifying them sooner and capitalizing on the momentum of effective marketing.”
In contrast to the in-flight marketing application, my choice for the top prize, Mining Twitter for Airline Consumer Sentiment by Jeffrey Breen, seems a bit less statistical, emphasizing more of R's mashup, data munging and graphical capabilities. The point of departure for Mining Twitter is R’s strength in assimilating data from outside sources.
One such package, twitteR, available on CRAN, “makes searching Twitter as simple as can be.” To load Delta Airlines tweet data into an R object, for example, a command like the following would do the trick: delta.tweets = searchTwitter('@delta', n=1500). Once the Twitter data are in R objects, popular data manipulation functions like those in the plyr package from Hadley Wickham can be used to “liberate” the text messages.
For a simplistic measure of sentiment, the author submits each message to a string manipulation procedure that ultimately counts the numbers of positive and negative reference words from an embellished opinion lexicon that's read into R using basic read functions. “To score each tweet, our score.sentiment() function uses laply() to iterate through the input text. It strips punctuation and control characters from each line using R’s regular expression-powered substitution function, gsub(), and uses match() against each word list to find matches.”
Once the data manufacturing and scoring are completed, there's a wealth of information to visualize and analyze – with no shortage of R capabilities. The author demonstrates the use of several R graphics packages to show flavors of sentiment distribution. And of course it's easy to generate data from competing airlines for comparative analyses.
The final step is to contrast the airlines data with benchmark results from the American Customer Satisfaction Index. The author deftly uses an R XML package to scrape data from ACSI website. He then merges the results of this inquiry with the stored sentiment structure and analyzes/graphs the results.
Viola – hypothesizing, mashing, munging, analyzing and visualizing – data science at its best with R.
Kudos to Revolution Analytics for sponsoring the competition to highlight business applications of R.