Many have claimed recently that multifaceted data scientists are mythical beings, as impossible to find as unicorns. This itself is a myth, and a dangerous one at that.
Hype is cyclic. A new idea excites people, exaggerated claims are made (and often believed), and the idea takes on bigger-than-life proportions. Eventually, however, reality sets in, and a critical backlash begins. There is a strong tendency to take the criticism seriously, much more than the initial hype. However, just as the goodness of the original idea ballooned wildly during the boom, the critique gets overstated significantly during the bust. The clear-eyed observer will avoid getting taken in by either of these sides of the hype cycle and move quickly to the “plateau of productivity,” but it is hard to see through the smoke and mirrors.
So it is in data science.
The well-documented need for data scientists, the “sexiest” new profession, led initially to a flurry of discussion of what constitutes a good data scientist, such descriptions often describing lofty and powerful multidisciplinary superheroes. The pushback inevitably soon arrived, with experts and pundits declaring that data scientists are mythical “unicorns,” and that companies should forget about them and focus instead on building teams of traditional experts (statisticians, operations researchers, software engineers and such) to do data science. Or as the parody Twitter account Big Data Borat had it:
Unicorn, Data Scientist and Santa Claus walk in bar. Bartender turn at Santa Claus and say "I must be hallucinate!"
— Big Data Borat (@BigDataBorat) April 17, 2013
Unquestionably, strong and diverse teams of specialists in relevant areas have a key place in any organization’s data science strategy. But the idea that data scientists are as mythical as unicorns is simply false.
Let us stipulate that “rock star” data scientists, who truly excel in all the important areas of data engineering, statistical modeling of complex data, building scalable software systems, and business analysis are in fact extremely rare. But it is not an all-or-nothing game. It is eminently possible to take people with expertise in one or more of these areas and teach them the fundamentals and key skills in the others to turn them into data scientists. This in fact what we at Illinois Institute of Technology are now doing in our Master of Data Science program, as are many other universities in similar programs.
The point is that “data scientist” is not merely a new name for “people with deep analytical skills,” as some would have it, but rather a designation for people that have, in addition to analytical skills, broad knowledge of the diverse areas relevant to dealing with complex real-world data science problems. Dealing with the size, speed, diversity and complexity of modern data sets requires much more than statistical analytical expertise and off-the-shelf software tools a data scientist must know about data representation, software engineering, visual design, and business processes, as well as being able to tell the “story of the data” based on rigorously quantitative analysis results.
This ideal “well-rounded” data scientist is not an omnipotent “unicorn” whose existence is mere fable, but rather a serious professional who understands the full lifecycle of a data science problem and also has deeper skill and expertise in one or several specific aspects of data science.
That is, there are many kinds of data scientists. One may be best at structuring and cleaning ill-formed data, putting it into a uniform structure that can be effectively analyzed; another may be expert at building statistical models; and a third may have strong skills in developing high-performance software systems for data analysis. But all of them will understand the need for data normalization, the assumptions made by different statistical methods, and the tradeoffs inherent in developing scalable data processing systems, as well as how to learn quickly about a business problem, formulate it as a data science task, and communicate results effectively to decision-makers.
These data scientists may not have the superhuman productivity or impact of a “rock star,” if you can even find one of those, but they are indispensable to developing strong data science efforts, even those based on multidisciplinary teams. As anyone who has tried to build a team out of such disparate professionals as statisticians, software engineers, business analysts and communications specialists knows, it is incredibly difficult to bridge the cultural and linguistic gaps between them and fashion a cohesive group that works effectively together. The data scientist knows how to think and speak as a statistician, as a software engineer, as a business analyst, as a communication specialist, and so can bridge these gaps and ensure a smooth data science process.
In fact, such a team is likely to perform best when most of its members are data scientists, with just a few deep specialists involved. Recall that every data scientist will have some area of particular focus and so several can together form a diversified team to attack complex data science problems. But since they all understand the rest of the teams’ areas of focus from the inside, they will better be able to work together, critique their shared work, and produce a coherent analysis the last of which is critical to data science success.
Additionally, many smaller organizations lack the substantial resources (easily $1M per year) to hire and support an interdisciplinary data science team. Such organizations might outsource their data analytics to one of a growing number of data science consulting firms, but would still significantly benefit from an internal data science generalist to coordinate the data science work and advise management on its proper application. The risk is that the loss of this individual might derail your data science efforts, but it can be mitigated by both ensuring proper documentation to aid continuity and designing a career development path for the position to increase its attractiveness.
You should not believe the fantasy that data scientists are mythical, and should definitely not neglect developing data science capabilities. It is hard to imagine any organization today that would not benefit from a serious data science program. And the easiest and best way to field such an effort is to find real data scientists to create it. Not unicorns, not superheroes, but data scientists. They are no myth they walk the earth today, their numbers are growing, and we all need them.