Cheat Sheets for Data Science
I received an email a few weeks back from a company called DataCamp announcing a comprehensive cheat sheet for the R data.table package. Knowing I'm a data.table proselyte, the co-founder asked if I'd take a look and offer suggestions and perhaps, I suspect, mention it in my blog.
I'm a big R fan and have been an enthusiastic data.table devotee since I discovered it several years ago. I'm also in frequent contact with package originator Matthew Dowle, and agree with more than a few in the R community that data.table's been a game-changer for elevating R's appeal to data scientists.
The DataCamp cheat sheet's well done and quite handy, covering most functionality in a single page. It certainly helps that the package serves functions in this case data management, access, grouping and summarization that are limited in scope. Still, the URL to the page is front and center on my notebook.
Ten years ago, I carried multiple cheat sheets in my backpack. Now, with all the support available in flexibly-queryable online documentation and support sites like Stack Overflow, the need for cheat sheets isn't as acute. I maintain, however, close ties with a few.
The ubiquitous R Reference Card by Tom Short has been my trusty companion for ten years. I'm impressed with how the author managed to catch the guts of “core R” in such a compact space. Every time I pull it out, I refresh my memory on R minutiae I'd either forgotten or never knew. Of course, the R ecosystem's now so large that you almost need cheat sheets for cheat sheets. R Task Views to the rescue. For graphics, there're cheat sheets for R's ggplot implementation that are also pertinent to the Python library of the same name.
Before I started with first S and then R about 14 years ago, my platform of choice for data management and statistical analysis was SAS. Now, though SAS is a much smaller arrow in my DS quiver, I nonetheless use it and language clone WPS semi-regularly. The timeless SAS Cheat Sheet by David Franklin makes it much easier for me to re-connect the SAS dots. Once this gets me back in the game, I simply Google my questions for the online doc answers. One change in my SAS arsenal not reflected in Franklin's page is Proc SQL, which I prefer to data step programming.
Just as I go back and forth between R and SAS for statistical analysis, so also do I migrate between Python and R for more comprehensive data science needs. And Python Basics provides a quick refresher on important basic syntax to get me going. One lament is that Python dictionaries aren't covered, but for those I just pull up archived code.
As I've noted in previous blogs, Python programming for analytics/data science has changed considerably in recent years with the ascent of community-provided libraries such as numpy, scipy, sci-kit and pandas. Given the array-orientation promoted by those packages, Python DS code can look as much like R as it does core Python.
And with these libraries now a cornerstone of analytics work in Python, it's not surprising there are cheat sheets for them as well. A spartan but still quite useful reference for numpy, scipy and pandas is available here. I use the splendid 10 Minutes to pandas ipython notebook at least every other month. Indeed, the pandas documentation is so good, I hardly ever now open originator Wes McKinney's well-written book, Python for Data Analysis.
Finally, I maintain my scikit-learn sanity by calling on Peekaboo's machine learning reference. Can't tell either the R or scikit ML players without a scorecard.
These cheat sheets are just the tip of the DS iceberg, a small, select sampling of what's available for data science. I welcome reader input on others they find useful for their data management and analytics work.