When Do You Need a Data Scientist?
Despite ample media coverage around the need for a data scientist, confusion abounds regarding the skill set and role for such a person. This challenge was tackled in a lively session at the 2013 Pacific Northwest BI Summit entitled, “Does the Data Scientist Have Mojo?,” led by Simon Arkell, CEO of Predixion Software, and Jill Dyché, VP of thought leadership at SAS.
Dyché noted that with the increasing importance of big data and analytics, the data scientist has a monumental, complicated job, and there simply aren’t enough data scientists to do the work. (For an entertaining take on Dyché’s perspective, read her blog post “Why I Wouldn’t Have Sex with a Data Scientist.”)
Acknowledging that relying on a data scientist to provide the full value of analytics isn’t realistic, Arkell sat down with Information Management’s Julie Langenkamp-Muenkel at the Summit to discuss why collaboration is key and the delicate balance between distributing analytics capabilities to business analysts and end users, and keeping mission-critical decisions in the hands of the PhD’s.
IM: What do you believe are the required skills for a data scientist? How heavily does the skill set vary based on the business need and the analytic maturity of the organization?
Simon Arkell: We very much believe in the concept of self-service predictive analytics, which is where you can envisage a subject matter expert or business analyst to do their own analytics. And we have some tools that help with that, it’s one-click analysis, it’s much easier, wizard-driven model-building exercises. And so you could squint at that and say that a business analyst could now become a data scientist by giving them the right tools that allow them to venture into that realm without ever having been fully trained as a data scientist. But more and more, we’re seeing mission-critical applications where companies are really taking predictive analytics seriously, and they are realizing that you cannot just leave a mission-critical application in the hands of someone who’s very new to it. And so there’s this kind of distinction between mission critical and non-critical, where if it’s mission critical you then have the opportunity to have a real data scientist who is trained, they probably grew up on SAS if they’re older or R, and they really know data science, but they may not know necessarily the industry or the problem set that they’re looking to solve for. And so, the opportunity there is to try and reduce the bottleneck on those very limited, very highly trained, technical data science people by giving them a team of business analysts or subject matter experts who can iterate on that model as they’re building it.
Why is leveraging the skill sets of the data scientists proving to be so challenging? And beyond the tools, in terms of organizational structure or organizational culture, what do you think can be done with that?
I think it’s just a learning process for an organization to get to that level of insight, institutionally. So in the early days you saw the real-time enterprise was the vision, where everyone had access to the KPIs and the dashboards, etc. and that’s where BI really took off 15 years ago or so. And this is the next iteration of that. Now we’re used to having the information we want when we want it in the right format. How am I going to actually turn that into real competitive advantage by getting ahead of problems and stopping them from happening or optimizing different sequences and programs?
In terms of reducing the bottleneck strain on the data scientists, you mentioned in your presentation that it is important to leverage the amazing skills that these people have but not become fully reliant on them. How do companies find the balance there and combat the tendency for over-reliance?
A real-life example is a customer of ours that is a huge multinational [corporation] and they have a team of business analysts who could very well be doing self-service predictive analytics. In this case it’s looking for potentially fraudulent transactions in the capital group of this big company. And what they currently do is they use an ETL tool to get the information that they’ve currently downloaded into Excel. They then pump that into an Oracle database, they ship that off to a data science team, the data science team runs some analytics, which to them is way below their skill set and is probably mundane and boring for them but they’ve got to do it because they’re the predictive analytics guys, the data scientists. Then they get the results and ship them back to the business analysts, and that loop takes a couple of weeks. But if those business analysts could have the right tools to basically click a button and run a prepackaged predictive model, they could score their own models and it could be really easy for them, and they could glean insights and additional data mining solutions from that process. So if you’re just equipping them with the right tools to do that and remove that two-week lag and take the heat off the data scientist, you’re also then training them up on how to do more and more for themselves over time.
Talk about why collaboration between data scientists and business analysts and SMEs is increasingly important.
Because these teams are so distributed geographically. One of our partners, Accenture, has data science teams in 10 different countries around the world and their customers are in every country of the world. So if you now give Accenture’s data scientists the ability to work with their customers regardless of where the data scientist or the customer happens to be, and you can give them this real-time collaborative, thin client environment, you’re able to get higher value much more quickly. And that’s going to be immensely valuable to the company.
In terms of subject matter expertise, how savvy does the data scientist need to be about the business in order to really provide value?
I think the subject matter expert provides a mentoring to the industry. We have our own internal data scientists who do solutions for customers. And they’ll get a data set and they know how to create a really accurate predictive model. But they’ll then send that to customer or talk to one of our people who is an expert in that space and realize that they used certain data points that were [already known]. Or they used the wrong data or data that if you were an expert in that space you would never use. And so they’ve got to work much more closely together in order to create the right predictive model quickly enough.
In an organization where it might not be a pervasive culture of analytics, might the popularity of the data scientist role create an opportunity for the data scientist to extend his or her influence to create a culture of analytics within the organization?
Absolutely not, I think it’s got to come from the top down. So it’s the CEO who’s got to make that happen. It’s a cultural change and it’s one that everyone across the organization has to buy into. And historically you see this great separation between the propeller heads and the sales or marketing people, as an example, and no one group is going to have the credibility to pull of institutional change. That has to come from the top down.