November 16, 2012 – As the saying goes, good help is hard to find. When it comes to the loosely defined qualities of a data scientist, locating and organizing expert help among the dearth of candidates may seem closer to impossible.
A few experts in this emerging field served up their perspectives on selecting data scientists and organizing teams for the right approach on analysis of huge and varied data sets Wednesday as part of the Chief Data Scientist Summit in Chicago held by BI event organizers *IE.
Before you start looking to hire a data scientist, you need to know what you’re looking for. Aron Clymer, data scientist at salesforce.com, oversees a team of about a dozen data scientists and business analysts for the SaaS vendor’s product lines that touch on about 1 billion behavioral data transactions per day.
Clymer suggested that to get an idea of the size and scope of the team, start with an assessment of the three types of “products” data science teams produce: ad hoc, periodic and real time. Ad hoc queries are often on the lighter-effort end of the spectrum, where periodic may take deeper digging and real time requires unique infrastructure. From there, you can better gauge a data science team’s reach across enterprise data and capabilities with adding data-backed insight on business questions.
And to formally organize your data scientists and business data analysts, you’ll likely “settle” between the two extremes of enterprise data science teams: the one-person “spanners” who truly cover all aspects to business analytics, and the piecemeal, multi-membered team approach.
The spanners like those harnessed at Netflix may be nice in terms of project turnaround and elimination of team friction, but they’re not realistic for most organizations to court, pay and keep a single, all-encompassing data scientist of that skill level. On the other end, a full data science team can provide a more well-rounded approach to business questions and niche skill set, though risks the same challenging issues associated with adding a tech team layer, as well as team member obligations with other departments.
In describing his own team, Clymer said a hub-and-spoke approach keeps data scientists aligned with the specific concerns and interests with particular products. As data scientists may typically work from data marts and off models, Clymer said it’s critical to have a separate ETL and data warehouse people – “Data scientists typically aren’t good at this, too.” – with as much automation included on those fronts as possible.
Then there is the issue of finding the right people for the job. Accretive Health Chief Data Scientist Scott Nicholson discussed the very human elements he’s looking for in analytics hires. Health care is a people-facing industry that requires transparency in its functions, so an ideal data scientist must bring solid communications skills and ample curiosity. Instead of someone who “jumps all over the tech,” Nicholson, who has also worked in analytics for e-commerce and at LinkedIn, said the best data scientists are people who ask: “How can I make a quick impact?”
“The engineering stuff you can pick up ... but the curiosity? That’s something that’s built in. I can teach someone Python, but the curiosity is far harder to get.”
Nicholson adds that the right candidate must be ready to follow a model from the first business questions through development and definitely into deployment. This “end-to-end” quality forces the data scientist to see the user’s predicaments. In addition, it should enable better understanding to follow up with business questions that root out more of the unknown patterns and problems lurking in the data.
George Mason University Professor Kirk Borne, an astrophysicist and computational science professor at the university, stressed the importance of getting clear communication on executive expectations. Recounting a project in a previous position at NASA that spent its first few months asking the “same question with different terms,” Borne recommended data science teams that have a foot firmly planted in the business operations of the enterprise. It’s a connection between the two sides of the house that he’s currently looking into with the relationship between computational and business school degrees at George Mason.
“Once you understand the business question, you can prioritize your response and even come up with better questions,” Borne said during a roundtable discussion at the event.