I subscribe to at least a dozen online groups of BI, analytics, big data, and data science professionals. It's almost impossible for me to stay up to date on all the blogs, articles, and discussions, so instead I pick my spots, often as not attracted to provocative titles and lead-ins. Over the years I've been pleasantly surprised by the quality of the content.
A few weeks ago, a Data Science Central piece entitled “R Programming: 35 Job Interview Questions and Answers”, by Laetitia Van Cauwenberge, caught my eye for several reasons. First, as an R programmer and enthusiast, I wanted to see how I'd do on the “test”. Second, having primary responsibility for bringing analytics and data science professionals into my company, Inquidia, I wanted to get a sense of the author's R interviewing philosophy in comparison to mine.
I kind of hedged around taking the test, first attempting to answer the questions as they might be presented in a no-notebook-accessible interview, and second having the luxury of a machine with R installed at my disposal. An almost 15 year R veteran, my self-appointed grades were B in the no-notebook scenario, and A with the computer in front of me. Not overly impressive.
On some questions, I had no chance. For example, I couldn't answer: Explain how you can start the R commander GUI . Though I'm sure I've used the interface in the past, at this point RStudio's the easy development platform choice for R programmers. Sorry Commander.
A question that historically has driven me to drink is What is the difference between sapply and lapply? When should you use one versus the other? Bonus: When should you use vapply? My response to the bonus: Never. The bigger question is one I can answer with my computer in front of me, but the more important response is that newer packages that provide similar but better-designed and more performant functionality, such as plyr, dplyr and data.table, have pretty much replaced the core “apply” family for many R programmers. And I don't buy the argument that answers to such questions should be confined to “core” R features. Superior, community-written packages and functions should be adopted, period. That's the power of vibrant open source ecosystems like R and python. Thank you Hadley Wickham!
Embarrassingly, I needed my computer to correctly answer: What is the difference between seq(4) and seq_along(4)? I guess I'm not a trick question guy.
After completing the test, I asked myself how I might use these or similar questions in the candidate interviewing that I do. My answer: it depends.
Inquidia does a lot of college hiring, so I often speak with statistics students who list R on their resumes. I invariably attempt to determine if their knowledge of R is limited to the types of functions that might be invoked to perform basic statistical analysis of curated data sets – or whether they've had to struggle to assemble and munge in a more challenging data integration environment. A couple of quick questions on how to read web data, handle dates, recode attributes, filter records, pivot rows to columns, and invoke ggplot generally suffices. The questions on with(), sample() and subset() from the DSC blog would work well.
For more senior-level candidates professing a wealth of R knowledge, I often start with questions on an analytic pattern of special interest to R, such as split-apply-combine. First, I'd ask the candidate why s-a-c would be important to R, then transition into an illustration from her work. A good answer to the first question would be that s-a-c conserves precious memory, and that performance-enhancing multi-processing can be brought to bear. A solid example suggests that s-a-c is prominent in her design thinking. And finally, I'd ask what packages she might use to implement an s-a-c solution. (Hint: several candidates are the replacements I proffered for the apply family.)
The DSC article proposes some very good questions for senior practitioners as well. The ones on testing, package development, version control, and SQL connectivity are solid. I'd add performance optimization and interoperability with big data platforms and the Cloud.
My final reaction to technical interviewing like this: I'm glad I'm the now interviewer and not the interviewee.