One of the new OpenBI hires tasked with getting up to speed on R approached me a few weeks back to solicit my recommendations for books/websites to turbo-charge her understanding of R graphics.
I was only too happy to offer my $.02 on materials I’ve found productive over the years, hoping to spare her a bit of the learning curve I experienced on my own 10 years ago. Fortunately, R’s statistical graphics are top-notch and there’s now no shortage of excellent sources for eager students.
It’s customary in the R world for analysts to communicate their findings through visualization of data, estimated parameters, predictions and errors. The good and bad news for R programmers is that they must choose from multiple, currently-available graphical subsystems. The three that today handle the bulk of plotting tasks in R are core or traditional graphics, the lattice package and the relatively new ggplot2. To get started with the basics of any of these, I encouraged the consultant to check out an established R graphics gallery, review a student guide available on the Web, and then take on one or more of the books below.
Traditional graphics are part of the base R installation. Included are functions for scatter plots, bar charts, dot plots, histograms, density plots, box plots, heat maps, geo maps and pairs/matrix visuals. One book I’ve found quite useful is “R Graphs Cookbook” by Hrishi Mittal. Mittal leads his readers through each of the individual plots, covering basic functions and additional parameters to customize the look and feel. He also provides examples of munging data to best support the graphs. His approach to presentation – “Getting Ready”, “How to do it …”, “How it works …”, and “There’s more …” – is natural and easy to follow. For me, the code I shamelessly stole to implement Edward Tufte’s sparklines was alone worth the price of the book.
I use the lattice package for most of my current R graphical work. While the basic plot types are similar to core, lattice is a step up in power and complexity. Out of the box, lattice provides trellis or small-multiple paneling capabilities. With paneling, the analyst can look at a scatterplot of school grades against test score conditioned on gender – showing the relationship between grades and test scores in separate panels for males and females. The two panels maintain the same scales for grades and scores to facilitate comparisons across gender.
In addition to its powerful paneling features, lattice graphics can be extended to enhance basic functionality. I often re-program individual panel functions to handle specific visual requirements. Deepayan Sarkar, the developer of R lattice, published an excellent book, “Lattice: Multivariate Data Visualization with R,” to introduce the complexities of such development. One illustration I’ve adapted details how to produce a low-ink variant of the box-and-whisker plot suggested by none other than Edward Tufte.
ggplot2 is yet another graphics package introduced a few years back by R community leader Hadley Wickham. ggplot2’s distinguished by its deep underlying grammar based on Wilkinson’s “Grammar of Graphics.” In his book, “ggplot2: Elegant Graphics for Data Analysis,” Wickham extols the grammar benefits: “ggplot2 is designed to work in a layered fashion, starting with a layer that shows the raw data then adding layers of annotations and statistical summaries. It allows you to produce graphics using the same structured thinking that you use to design an analysis, reducing the distance between a plot in your head and one on the page.” Though I’m still a bit hesitant to acknowledge ggplot2 can compete on all fronts with lattice, I think there’s a reasonable chance it’ll become the R graphics package of choice within a few years for the next generation of R developers.
Once my new colleague ascends to the intermediate level of sophistication of these books, I’ll recommend the advanced “R Graphics” by R core developer Paul Murrell. “R Graphics” covers both traditional and lattice models and demonstrates how to access the low-level functions with a deep dive into the grid subsystem. Those programmers looking to develop new graphics packages in R will find this book indispensable.
Down the road, the challenge I’ll give my new colleague is to “port” her learning from R to Tableau with live, interactive versions of all visuals developed in static R. Alas, I’m still waiting for the Tableau/R interface that will combine the capabilities of these two powerful tools.