A few weeks ago, I started an analytics engagement with a manufacturing company. Their interest was in developing dashboards that would show the results of product reliability models both individually and by various groupings.
OpenBI loves working at the intersection of BI and analytics, so dashboards of “Weibull” coefficients are right up our alley. Another plus for me was that the types of reliability analyses they do is similar to the survival models used in the medical and clinical trials data worlds with which I'm familiar. A complicating difference between survival and regular regression is that with the former, many observations are censored or suspended, meaning they have not at the time of measurement died or failed.
When my sponsor first sent me a spreadsheet with data for a number of products, I misinterpreted the meaning of the rows, subsequently applying the R models incorrectly. He then sent another file with “triangular” matrices representing the data, along with reference to an article on how to interpret the matrix. After reviewing the data, I read the article and set out to write the code to do the job.
I was pretty literal in my translation of the article recipe to my initial R code. After a few iterations, I'd coded and tested a 20 line function I was confident correctly produced the “data.frame” to be used as input for the survival functions. Performance was fine for all by the very largest inputs. Even then it was just a few seconds, nothing to fret about.
Still, I wasn't content with the code I'd written. The function was driven by a doubly-nested loop across rows and columns of the matrix, a motif not often found in array-oriented R. Indeed, with R's object orientation, many functions operate seamlessly on multiple data types such as scalars, vectors and matrices. And R has loop replacement methods that iterate over data structures automatically. So, my brain fried, I decided to put the code down and revisit it for performance later, remembering well a favorite maxim from Kernighan and Plauger's 1978 classic “The Elements of Programming Style: Make it Right Before You Make it Fast.”
The next morning was consumed with cleaning 5 inches of snow off the driveway and sidewalks. While working the snowblower, I started to mentally revisit the algorithm I'd put to rest the night before. Within minutes, an idea to dramatically simplify the code with existing R functions occurred to me. It was a “duh” moment – how could I have not envisioned the simplified version from the get-go? After finishing the driveway and sidewalks, I returned to the previous day's code, cutting it from 20 statements and two loops to five statements and none. The code finally read like the simple R task it was. And performed like it too.
I wish I were smart enough to get things right the first time but, alas, I'm not. So another K&R bromide, Don't stop with your first draft, serves me well for both my writing and programming work. For better or worse, most of my IM blogs have been revisited multiple times, even when the lion's share of text comes from a single sitting. Now if I could only convince my college paper-writing sons to do likewise before asking me to review their work!
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access