It’s been a few years since the Harvard Business Review declared Data Scientist “The Sexiest Job of the 21st Century,” so it seems like a good time to revisit the idea. With 250 petabytes of data created each day, there’s no doubt that the big data trend is only accelerating.
But as businesses try to figure out how to capture, store and, most importantly, get value from all of that data, some of the sheen of the big data revolution is beginning to wear off. After all, if we’re collecting exponentially more data than ever before, but only making marginally better decisions than we did before, is it worth it?
Part of the problem is undoubtedly due to the shortage of truly great data scientists that many of the field’s pioneers predicted. The slow pace with which enterprises change the way they make decisions is also partly to blame.
But a third, often unrecognized factor is that the gold rush mentality led to a misplaced focus among data-scientists-in-training (and the programs that train them) on the technical “hard skills” of data science, at the expense of the softer skills that ensure data initiatives actually deliver business value.
Unfortunately, “data scientist” has come to mean “I know the difference between Support Vector Machines (SVM) and Random Forest algorithms and how to implement them.” But since these scientists often don’t understand specific business problems they’re trying to solve or how to communicate their findings to non-technical people, they often lack the ability to meaningfully contribute to the bottom line.
So I want to lay out some of the skills that business leaders should be looking for when they hire data professionals today--whether they be data analysts, data engineers, data product managers, or data scientists. If you’re a student or job-seeker looking for a data job, this same list can guide you as you sharpen your skills and get ready for interviews.
Before we begin…
Keep in mind, data is a team sport. There are simply too many skillsets required in a successful data effort for any one person to be an expert in all of them. So rather than chasing a few superstars who can do it all (and will likely cost you a ton), assemble a balanced team that, collectively, can do all the things needed to make a project successful.
Are they curious?
The first trait you should be looking for in a data person is curiosity. This isn’t a new idea. In fact, it’s mentioned as the dominant trait of data scientists in the HBR article from 2012. But it’s important to make sure that the curiosity of your data hires isn’t restricted to technical questions.
You want someone who is genuinely curious about how the world, and your business, works. Someone who will ask “What are the main drivers of your business” and “Why do you do it that way?” A data scientist who is only curious about whether neural networks offer advantages over SVMs is likely to fall into the trap of chasing “interesting” problems that aren’t actually useful to your business.
There’s a tendency to look for data people among computer science and statistics majors, and there are undoubtedly many talented candidates there. But if a computer science grad never took any courses outside the comp sci and math departments, that’s often a red flag that their curiosity may be narrow.
Don’t rule out those who’ve studied other fields with strong quantitative methodology. Social sciences like economics and psychology, natural sciences like biology and physics, and business-focused fields like business administration and finance all can be wonderful sources of candidates. And don’t rule out candidates just because their degree is in something further afield like literature or languages or music (like mine).
A well-rounded educational background is often a great sign of curiosity. Additional technical skills can be learned. Curiosity is usually innate.
Are they bilingual?
The next thing I look for in data hires is their ability to speak both the language of data and the language of business. The people who can fluently do both are immensely valuable.
A huge part of a data person’s job is sitting down with business owners, understanding what business problems they’re trying to solve, and then translating those into tractable data questions.
Too often, a business owner arrives with their data question already formulated (e.g. “Can I see orders by hour of the week broken down by timezone?”) But because they don’t have the same familiarity as an analyst with what data the business collects, how it’s structured, and what it means, they often ask the wrong data question.
That’s why having data people who can push back and say “What are you trying to figure out?” is so critical. Once they have that context, skilled data practitioners can often suggest several lines of inquiry that might shed light on the business problem. But they can only do that if they understand the business enough to do that translation.
Bilingualism is also essential because once an analysis is complete and ready to be presented to stakeholders, data people have to have the skills to translate their findings into language and visuals that make sense to their less-analytical colleagues.
As I’ve written before, “Good analysis presented poorly is just as useful as bad analysis presented well.” If an analyst’s idea of presenting findings to business stakeholders is saying “We found a strong negative correlation--R^2 = 0.53. I’ve attached the raw data and the output of the logistic regression to this email,” then it doesn’t matter if they’re one of the smartest analysts in the world. Their work is unlikely to provide any value to the business.
Are they humble?
A huge part of data work is exploring and trying to quantify the unknown. Not surprisingly, this means that being wrong is a big part of the job. So is happening upon what you think is a golden goose, only to realize that you’re on a wild goose chase.
Data hires need to be comfortable with the ambiguity that comes with data work and with admitting error. Know-it-alls are quite dangerous in the data world because while technology tends to fail noisily (the site is down!!!!), data tends to fail silently.
If you make a recommendation that turns out to be wrong, and the business changes course because of it, you need to be ok admitting you made a mistake rather than letting colleagues barrel ahead basing their work on a mistake.
Are they skeptical?
Speaking of golden gooses, they’re rare. If the data is telling you something that seems too good to be true, chances are excellent that it is. Data people shouldn’t be cynical, but they should be skeptical.
They should know that small changes are likely to lead to small results, that most experiments won’t yield much, and that if something was obvious, someone would have likely found it already.
Data people need to question their assumptions (and the assumptions of others). They need to recognize that humans are really good at recognizing patterns--so good in fact that we often invent ones where none exists.
The best data practitioners know that the more helpful a result is to them, the more skeptical they need to be of it. They find multiple, independent ways to come at a problem to see if their initial findings hold up. And even when they’re ready to move forward, they never fully give up the belief that maybe they missed something.
Are they practical?
Data people love new problems. Data engineers research that new framework because maybe it’ll provide better stability. Data scientists keep tuning their model because maybe they can eke out a bit more predictiveness. Analysts add a few more fields to their analysis in the hopes they’ll shed more light on the problem.
And this curious nature is a critical component in their success. But it’s easy to lose sight of the topline goal and find yourself deep down a rabbit hole.
That’s why it’s so critical that members of your data team know how to pull themselves (and others) out of those rabbit holes. They need to be able to ask whether a project is just “interesting” or if it’s “useful.”
Often, the initial phase of a project will deliver the bulk of its value. Continuing to work on it, while intellectually stimulating, isn’t likely to deliver larger gains. Data practitioners need to be able to see the larger context they’re working in and find the next project that’s going to deliver big value.
Are they willing to get their hands dirty (with data)?
Too many data science training programs focus intensely on teaching the algorithms that data scientists use, using pristine datasets that are never found in the real world. They ignore the fact that most of a data scientist’s time is actually spent finding, cleaning, and reshaping raw data to make it ready for modeling.
Unless you’re managing a huge data team with tons of specialization, you don’t want to hire a data scientist who expects that others will do all the preparatory work for them. You need people who are willing to dive in themselves and get a ground-level understanding of what the data is like.
There’s simply no substitute for getting your hands on the data. There’s no algorithm you can deploy to make sure the data is clean and makes sense. That requires subject matter expertise and elbow grease. As Anscombe’s Quartet so beautifully illustrates, relying on summaries to understand the data can be deeply misleading.
Are they lifelong learners?
You may have noticed that I haven’t yet mentioned the technical skillset required to be a successful data practitioner. That’s not an accident.
This isn’t because technical skills are unnecessary--they’re absolutely necessary. It’s because the specific tools that people are using change dramatically and quickly. So the important thing isn’t which skills you already know, it’s the process by which you gain skills.
A strong grounding in some of the fundamental tools of analysis (i.e. some combination of SQL, R or Python, Hadoop, Excel, D3, and Java/C/C++) and the theoretical basics that underpin analysis (e.g. statistics, data warehousing principles, accounting, general numeracy) are great, and I’d be reticent to hire someone who didn’t have some of these.
But to be successful in the data world, you absolutely must be a lifelong learner--someone who loves to pick up new skills and can do so with minimal coaxing. I always look for autodidacts and people with fascinating side data projects when I’m hiring, knowing that they’re self-motivated and will be able to adapt to our toolset.
That’s not to say you should rule out people who picked up their skills in a more structured setting, but be skeptical if the only place they used those skills was on class assignments.
So that’s it. Data people should be curious, bilingual, humble, practical, skeptical, lifelong learners who aren’t afraid to get their hands dirty.
I realize that’s a pretty different description than the more normal “math major who is experienced with many types of machine learning.” But if you use this framework to build your data team - or to become a member of a data team - I think you’ll be amazed at what your team can accomplish.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access