You need talented people to lock on to the valuable elements of vast and varied data streams. Like big data itself, these “data scientists” come with all types of lofty definitions and expectations. So we reached out to Anand Rao, innovation lead for the analytics group at PricewaterhouseCoopers, who has turned much of his consulting attention of late to enterprises looking to bring on or train a wave of data scientists. Rao spoke with Information Management on the differences between data scientists and what enterprises already have on staff, some approaches he sees making early inroads and the emerging importance of layered visualizations.
Information Management: How do you define this data scientist role? And how is it different from what businesses already have?
Rao: An ideal data scientist – it’s always hard to find an “ideal” anything – would for me have four equal parts: 25 percent business knowledge, 25 percent analytics expertise, 25 percent technological capabilities and 25 percent visualization. You need someone who understands the business issues and the questions business is asking; that’s where it should start. Then, basically, the same person should be proficient enough to understand what techniques we should be using to address that business concern. ... There’s obviously a challenge to find someone who has an experience or background in all four. You typically have someone who can cover two out of these four, and then the question becomes how much you can train or integrate them with the remaining two to really play the role of the data scientist over time. You might have people who are smart at analytics and the business data behind that, but not as good as the technical needs or visual needs at the start.
It seems there are a few enterprises diving head first into this data scientist and big data realm. You hear about a few of the very tech-progressive organizations who already put a lot of stock into analytics who make a good case study for having a data scientist. But every trend has its early adopters. How tough or vital is it for the majority of businesses to snatch up a person who is specifically a data scientist? Or is it more of a matter of new training and internal roles with something like unstructured data?
It is a very small pool of businesses that can afford or attract the talent. What the rest of businesses are starting to do, at least, is finding someone more quant oriented and then aligning them with more of the business capabilities, or vice versa, pairing a businessperson with another who has more of a technical or analytic background. These people together can address all of the needs of the role of the data scientist. ... So that ideal may not be perfect, but we need a definition so that we can hope to get closer. It’s a big challenge, especially if you’re not wired for this role ... You need that left-brain and right-brain person. What we’re really talking about is people with entirely different mindsets. There are a few good cases of these great data scientists, but for most businesses it’s about having a few of these roles covered and then expanding their role with increasing experience.
From the business side of the house, the data scientist could be viewed as a fairly abstract role. Tell me how you sell this position or the related training to the C-suite. Or will it inevitably take a few years to catch up, regardless of all the chatter on big data?
It is definitely going to take a couple of years. We’re seeing a few interesting models emerge by the companies that are approaching data science. The predominant mode has been more around getting the data science going in a particular department or business unit. You have someone in marketing analytics or enterprise information management working from a data scientist point of view, and then expand their role. If you try one area, you can learn from that and show business, “This is how we could do something across other functional areas, or even the whole firm.” You area also seeing a couple of [enterprises] create almost an entire department around the data scientist. What they’ve done is have a separate data science crew reporting to the CEOs. These are highly motivated and independent roles with a chief data officer, who has a data scientist for different business geographies or units. They’re essentially embedded in departments trying to solve problems particular to marketing, for example, and they have a dotted line back to the chief data officer, who connects with the CEO. This is very ambitious and it remains to be seen how easily it could be repeated, but it embeds day-to-day and long-term roles of data science and data disruption to the business.
That latter approach could lead to its own silos or departments that may be hesitant to open themselves up, I’d imagine.
Those are some of the challenges, certainly. If you believe in the hype and to some extent the reality, this is an extension of using analytics in a better way to make business decisions. We have seen that evolve from automation in the 80s and 90s, to what we’re getting to now, which is using analytics as the true basis for how organizations make decisions. It requires a substantial cultural shift and change. So you’d have to think there will be resistance. People think this is questioning their ability to make decisions. But this is an effort to make sure it’s not just the gut feeling. You may have a gut feeling, but can you substantiate or invalidate that sense with data, and what data do you need and how do you support it. Here’s where data scientists hit the same psychological and cultural barriers from [previous technological change].
I want to go back to an aspect of the “ideal’ scientist you brought up earlier. The visualization element of the data scientist role could be a quick path to business results. Could you tell me more about what you’re seeing of big data visualizations, in terms of what’s working and what’s not?
Visualization is a newer branch that people are grappling with. It’s in its very early days, but it’s emerged with certain characteristics. One, because of the technology, visualization of this data has been very static. Not everyone can easily read some of these graphs. But now we’re getting more interactive, where you can see data that shows changes, real relations between how business areas go up and down with the overall economy, for example.
Some of it is eye candy, but it’s fundamentally the way our brains are wired, that we can pick up on visual cues better than just looking at data. With two columns of data across 60 rows, you can’t see a pattern. But a graph that shows changes over time is where it brings people in. For data scientists, where it gets more creative, you start looking at some of this data and think about how you can visualize more than 2D graphs and turn [data streams] into different colors and multidimensional motion charts. ... That may sound like a simple thing, but it brings a better view of how your business performs and brings to life all of the data you’re capturing. People want to zoom in and zoom out and dig around large amounts of data, but they want it ... to be interesting. If you literally have 1 billion data points, the question for the data scientist becomes, how are you going to visualize all of that? And this is an emerging technique.