There's a bit of a generational divide within the R statistics community on the use of available packages. Old geezers like me are often set in their R ways, using functions/packages they learned back in the day and have been successful with for years.

But with the explosive growth in R comes additional capabilities – with newer libraries often providing enhanced functionality to those extant. And, for upstarts in many cases, newer is better regardless of capabilities – hence the divide.

I wish I had a nickel for every millennial who asks why I persist with lattice graphics when ggplot is the “no-brainer” current choice for R statistical visualization. Truth be told, I've developed some pretty nice tools in lattice over the years that allow me to jump-start many graphical tasks. For my money, if it's not broken, don't fix it.

I also tested ggplot on several occasions in its infancy, each time finding much to be desired with both functionality and performance. Of course that was years ago. My last investigation 24 months back concluded  ggplot was pretty much the equal of my preferred lattice, though I found equal not a compelling reason to switch.

There's probably no serious current R analyst/programmer whose work doesn't revolve on RStudio IDE for R. “Inspired by the innovations of R users in science, education, and industry, RStudio develops free and open tools for the R community.......Our goal is to empower users to be productive with R. Let us know what you are doing and how we can help!”

One way RStudio's helped the community appreciably is with publication of a series of “cheat sheets” to document oft-used packages/platforms for R users. I highly recommend them to R developers and carry copies with my notebook computer.

The Data Visualization with ggplot2 Cheat Sheet is nothing short of superb. For those with a basic understanding of trellis concepts and ggplot's “grammar of graphics”, the document provides just about all that's needed to start producing compelling statistical graphs. 

ggplot, and the DVCT describing it, is organized around layered “geoms” that depict points and variables. The sheet differentiates one variable graphics from two variable combinations of discrete/ continuous and even more complicated multivariate. In addition, comprehensive sections document appearance-controlling stats, scales, themes, coordinate systems and facets/trellises.

My recommendation is to work through the illustrations with the data referenced in the doc and included in the ggplot2 package. Try some of the noted options. After that, choose a trusty data set from your own portfolio, preferably one with more than 100,000 records. Working with the “personal” data  challenges you to generalize the learning from the scripts, while the 100,000+ count tests package performance. If you have no such data, the included diamonds data.frame, with roughly 54,000 records and 10 attributes, makes an acceptable substitute. One additional plug for learning ggplot: there's now a version available for Python that's not far from being usable.

I wasn't very enthusiastic about Data Wrangling with dplyr and tidyr. My discontent, however, had less to do with the quality of the document than with the nature of showcasing data wrangling/munging. I'd recommend the materials referenced here as a starting point instead. After absorbing them, the CS would probably be more informative.

Dplyr/tidyr competes for R wrangling affection with my beloved data.table, though I believe the two can collaborate to the added benefit of developers. Like ggplot2, dplyr/tidyr is the work of R community luminary and RStudio executive Hadley Wickham, who's committed to extending R's reach in the data programming/analysis world.

Just as RStudio has become the de facto development environment for the R community, so has RMarkdown become the preferred authoring format of dynamic documents that combine R code and output into presentation-quality reports. The key benefit of RMarkdown is that “documents are fully reproducible (they can be automatically regenerated whenever underlying R code or data changes).”  Think of RMarkdown documents as the poor man's iPython notebook for combining code and output, often the final deliverable of data analysis with R.

RStudio provides all you need to get up to speed with RMarkdown: Start with the Quick Tour, and progress to the RMarkdown Cheat Sheet and finally the Reference Guide.  Plenty of guidance for learning this now indispensable tool.

I'll discuss the Shiny and Package Development Cheat Sheets in a future blog.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access