Forrester Research
for Information Management Blogs
NOV 16, 2011 4:18pm ET

Blogroll

Data Scientist: What Skills Does It Require?

Print
Reprints
Email

(Editor’s note: For James’ introductory blog on this topic, click here. For his follow-ups, read "Data Scientist: Do You Truly Need Big Data?", "Data Scientist: Is This Really Science or Just Pretension?" and "Data Scientist: Which Adjacent Roles are Central?")

Data scientists are a curious breed. The term encompasses a wide range of specialties, all of which rely on statistical algorithms and interactive exploration tools to uncover non-obvious patterns in observational data.

Who belongs in this category? Clearly, the “quants” are fundamental. Anybody who builds multivariate statistical models, regardless of the tool they use, might call themselves a data scientist. Likewise, data mining specialists who look for hidden patterns in historical data sets — structured, unstructured, or some blend of diverse data types — may certainly use the term. Furthermore, a predictive modeler or any analyst who builds fact-based what-if simulations is a data scientist par excellence. We should also include anybody who specializes in constraint-based optimization, natural language processing, behavioral analytics, operations research, semantic analysis, sentiment analysis, and social network analysis.

But these jobs are only one-half of the data-science equation. The “suits” are also fundamental. Any business domain specialist who works with any of the tools and approaches listed above may consider him- or herself a data scientist. In fact, if one and the same person is a black belt in SAS, SPSS, R, or other statistical tools, and also an expert in marketing, customer service, finance, supply chain, or other business specialties, they are a data scientist par excellence.

Both of these skill sets are fundamental to high-quality data science. Lacking statistical expertise, you can’t understand which are the most appropriate algorithms and approaches to make the foundation of your statistical models. Lacking business domain expertise, you can’t identify the most valid variables and appropriate data sets to build into your models around.

In establishing a data science center of excellence in your organization, you must institute forums, processes, training, tools, and other initiatives that bring people with these diverse skills together to collaborate on common projects. You must also encourage people from each camp to cross-train in the other’s area. Business analysts must learn more sophisticated statistical techniques than their schooling instilled in them and more sophisticated tools than their spreadsheets. Statistical analysts must attach themselves to business groups or functions and learn how to apply their quantitative smarts to real operational problems.

Is the garden-variety spreadsheet jockey a data scientist? Yes, to the extent that they build statistical models and use the tool to find non-obvious patterns in structured data, they are engaging in a form of data science. But if this exploration is not their primary job function, they are merely dabbling, not specializing.

Is BI report-building or OLAP cube-development data science? No. Those endeavors, although important, revolve around obvious data patterns — obvious in the sense that an organization has chosen to embed them in repeatable views and access patterns.

Data science is all about asking questions. You engage in it whenever you interactively and iteratively search for deep, hidden patterns.

This blog originally appeared on Forrester Research.

Advertisement

Comments (7)
Jim, I think everyone using the term "data scientist" is doing those in data management industry a disservice. The term "data scientist" sounds like another elitist title that separates IT from the business, or as you so eloquently put it, the "suits". As a "suit" I can tell you the last thing I need is someone trying to convince me the world of data is more complicated - what I need is answers.

As for quants, they are mathematicians, as are most statisticians as you rightly point out. Using the term scientist is an improper- and potentially misleading - definition of the value they bring to the business.

Bill

Posted by Bill H | Thursday, November 17 2011 at 5:06PM ET
I'm inclined to agree with the previous comment; I think to rename those creating models in industry as 'data scientists' is a misnomer, they may be highly skilled data analysts, but in my mind a data scientist (may exist either in industry or academia) may be involved in items such as new data structures beyond RDBMS or developing mechanisms such as mapreduce or other new, cutting edge data types, data structures, etc.

While I have a high degree of respect for both the "quants" and the "suits", they are both working with the existing data and would not fall into my understanding of a "data scientist."

Posted by Gary B | Friday, November 18 2011 at 11:58AM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for James Kobielus

Big Data for the Global Grid
Big Data’s Open Source Momentum
Best Practices from Real-World Experiments
Naïve on Big Data’s Evolution?
Social Media Analytics Revolutionizing Marketing Campaign Management

More from James Kobielus »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.