Critics of the open source R Project for Statistical Computing often cite the complexity and quirkiness of the R language as a downside when comparing the platform against competitors, claiming it takes months and even years to master language intricacies. And though there are GUI front-ends to help (with a major one in the works from Revolution Analytics), according to the critics, R is still a complex language to work with.
I must acknowledge I now agree with the R nay-Sayers. Even after eight years of working with R and two with its older S cousin, I still don't see myself having as advanced a command of the language as I'd like. It seems every time I review R code I wrote a year or so back, I can think of a better and more efficient way to do the task. After the fact, I routinely discover functions I didn't know existed that can dramatically simplify the code. Annoyingly, I’m never “there.”
My frustration boiled over when I followed the upgrade to the latest release with an update of the R packages I use regularly in my work. Packages are extensions to the core platform developed and made accessible to the community for free. Central to my environment is the dimensional graphics package lattice with extensions from latticeExtra. I also rely heavily on Frank Harrell's broad collection of goodies in Hmisc.
After getting all the pieces in place, I started looking at the latest documentation and was most disappointed with what I found. Having invested quite a bit of time to enhance the basic dotplot, I found precisely the functionality for my needs in segplot. My 15 lines of code are now reduced to 5. How rude.
I've also spent a lot of time creating re-usable templates to standardize the look and feel of my R graphics. So what do I find in latticeExtra this go round? Pre-built templates better than mine. The first functions are modeled to copy the look and feel of another popular R graphics package, ggplot2, the second to mimic the graphical format used by The Economist magazine: http://www.economist.com/node/21525933. All my work down the drain. Ticked me off.
I love Professor Harrell's Hmisc package, a cornucopia of functions, models and graphics accumulated over a career as a well-published biostatistician at Virginia, Duke and Vanderbilt. I always use the Hmisc describe function to concisely summarize the attributes of my data frames. And I just discovered a new Hmisc predictive model, areg – Additive Regression with Optimal Transformations on Both Sides using Canonical Variates – he's made available to the community. After spending a few hours figuring out areg using several of my trusty predictive modeling data sets, I've become a big fan.
But Frank's a biostatistician. He should be consumed with survival analysis and variance components for randomized trials. What's he doing developing a high-powered predictive model that's not even acknowledged in the statistical learning task view? And, incidentally, areg includes a cross-validation function as well. Is Frank trying to do all my work? Can't he leave something for me? I'll have to send him a nasty note.
Of course, my tongue-in-cheek observations are in fact a celebration of R rather than a critique. I'm continually amazed at the ever-expanding bounty of contributions from some of the top quantitative analysts in the world. With this wealth of goodies, though, comes an admonition from the old buy versus build conundrum: do your research on what's already available before you invest a lot of time developing from scratch. Often as not, what you're looking to do is already out there – for free. I need to practice that lesson even as I preach it.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access