I’m at a crossroads with statistical graphics. An open source R user for over 10 years after migrating from R’s commercial S-Plus cousin, I started out using Trellis graphics in S-Plus and then adopted the lattice package when transitioning to R.
Graphics are front and center in the R mindset. Indeed the R “motif” for data analysis revolves on telling a story visually. And R has the chops to fully support that approach. The dimensional, small multiples capabilities of Trellis/lattice are a big step up from the primitive SAS graphs I’d used in the ‘90s. I could do all kinds of neat new stuff with lattice, such as visualize a scatter relationship between x and y overlaid by predictive models and mathematical functions – all dimensioned by additional factors that could readily be compared among small visual panels.
As powerful as the lattice package is though, I found the learning painful. It seemed I could ultimately always do what I wanted, but not before, at times, pulling my hair out. Maybe it’s me, but the workings of lattice often seem mysterious, even with the support of package developer Deepayan Sarkar’s excellent text “Lattice: Multivariate Data Visualization in R.” Today I’ll still come across lattice code posted somewhere and wonder where a feature I’d never seen before had been for 10 years.
I guess it was about 6 or 7 years ago that I discovered the new ggplot entry in the R graphics world. What was immediately attractive about ggplot and follow-on ggplot2 was that its author, R wunderkind Hadley Wickham, had designed the package to implement “The Grammar of Graphics,” an abstraction developed by Leland Wilkinson “which makes thinking, reasoning and communicating graphics easier.”
The layered, object-orientation of the semantics of ggplot made a lot of sense to me, especially since I was frustrated with the vagaries of lattice. Consider the following simple illustration of ggplot:
p <- ggplot(mtcars, aes(x=factor(cyl), y=mpg))
p + geom_violin()
p + geom_violin() + geom_jitter(height = 0)
p + geom_violin() + geom_jitter(height = 0) + facet_grid(. ~ am)
The first statement establishes the basic ggplot meta-data relationship between mpg and cyl for the mtcars data frame. Subsequent lines would print a violin plot, a violin plot with jittered data points, and a violin plot with jitter points and panels or “facets” for each value of another attribute (am). Easy to follow.
A similar set of graphs in lattice isn’t difficult to produce, but does require programmer digging into the innards of panel functions. ggplot just seems more intuitive.
Alas, as much as ggplot made conceptual sense to me then, it couldn’t supplant lattice as my graphics staple for a number reasons. First, you like what you know, and I was frustratingly familiar with lattice. Maybe lattice had cognitive dissonance going for it: if something is difficult to learn, it must be good!
Second, V1 ggplot was just that, version one. In retrospect I’m sure Wickham would agree that the initial releases were really more proof of concept than ready-for-work packages.
And third, ggplot’s performance paled in comparison to lattice. One test I ran calibrated lattice almost an order of magnitude faster in completing 1,000 “growth of $1” investment portfolio plots. I even remember watching an early scatterplot grind through painting just 10,000 data points. Not ready for prime-time, I concluded.
So I abandoned ggplot, only to try again three years later. Alas, pretty similar less-than-stellar results. Still not ready in my estimation.
The latest overture, however, has been much more satisfying. ggplot2 has continued to add new features, making it now functionality-competitive with lattice. My learning trajectory, while not yet at the lattice level, has nonetheless progressed to where I’m now comfortable divining ggplot solutions to most R graphical challenges. To test myself, I’ll produce visuals with both lattice and ggplot2. Kind of like learning to use my left hand in basketball.
If it hasn’t already, ggplot2 will soon supplant lattice as the graphics package of choice for the greater R community. Certainly the new generation of R statisticians is adopting ggplot2 over lattice. I bet in time, even statistical curmudgeons like me will make the switch. Right now though, I’ll address my R graphics challenges by combining the best of two attractive alternatives.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access