This year's conference was even better than the 2009 inaugural, the in-excess-of-200 participants consumed by more than 20 consecutive high-powered presentations over the fast-paced day and a half. And while I'm a quantitative finance welterweight at best, there was plenty to pique my interest, including the latest developments to scale R for size and performance.
As an indication of current R community obsession, no less than six presentations addressed R's ability to size up for “large” problems. David Smith of commercial R vendor Revolution Computing spoke on analyzing large-scale financial data sets in R. Vanilla R suffers from capacity (memory) and performance (cpu) shortfalls. The Revolution team has built a prototype R package that addresses memory capacity through the deployment of a new file data type that needn't be memory resident. In addition, the package supports clustering and parallelism to combat cpu bottlenecks. Mark Seligman introduced a package that exploits the graphical processing units now available on many new computers. Performance improvements introduced by gpu functions are in many instances significant.
Saptarshi Guha, Robert Grosssman and Stefan Theussl each separately discussed R applications across the Hadoop distributed file systems (HDFS), with computations by the Hadoop distributed computing engine (MapReduce). Guha detailed the Rhipe package (R and Hadoop Integrated Processing Environment), Grossman focused on cloud computing, and Theussl illustrated MapReduce in the context of distributed text mining with R. Finally, R/Finance committee member Jeff Ryan presented his nifty indexing toolkit, a package that “allows users to index columns of any R object and search for and search using binary or rle encoding, all with standard R semantics”. Performance gains from such access can be half an order of magnitude or more.
As impressed as I was with the performance focus seen here, the larger mandate is to make such considerations as transparent as possible, assuring that core R packages and functions scale with little intervention, thus protecting investments in the established R code base.
There was plenty for the quantitative finance aficionados at R/Finance 2010 as well. Bernhard Pfaff presented on Risk Modeling with R, introducing the key concepts of value at risk, volatility and expected shortfall. He then ratcheted up the math with generalized hyperbolic distribution treatment of fat tails, conditional volatility modeling with GARCH, and fundamental, implicit and explicit copulas. I'm not embarrassed to admit I was pretty much lost at the end of this hour-long talk.
I stayed with Achim Zeileis's presentation on predicting Chinese currency exchange rates a bit longer. Zeileis adapted exchange rate regression to test, monitor and date changes in Chinese currency regimes. As Zeileis was presenting, the thought occurred to me that a similar approach might be productive for testing, monitoring and dating company performance following the introduction of a new strategy intervention.
Back at a level more my speed, Nicolas Christou presented statistical finance lite, introducing stockPortfolio, an R package for optimizing portfolios using a number of different models, including Markowitz variance-covariance, constant correlation, multigroup and single index. When I returned home for the day, I downloaded stockPortfolio and immediately started using its getReurns, stockModel and optimalPort functions. Nice new toy!
Finally, Jonathan Cornelison's presentation on the RTAQ (R Tools for Analysis of Trades and Quotes) package used to assess intraday trading strategies and measure liquidity and volatility hit home with the ETL and cleansing sides of BI, including functions to clean, match and aggregate trade and quotes records.
At the end of the day and a half, I left impressed with yet another strong international display of the R community. At the same time, I was fatigued from exposure to copulas, Brownian motion, GARCH, Black-Litterman models, Markov regime-switching, finite difference engines, stochastic differential equations, random walks and quadratic and dynamic programming. Bet I'll come back for more next year, though.
Kudos to Gib, other committee(2010) members Jeffrey Ryan, Dirk Eddelbuettel, Dale Rosenthal, Brian Peterson, Peter Carl and John Miller, and support associates Linda Heinig and Holly Griffen for another splendid show!
Steve also blogs at Miller.OpenBI.com.