I just finished Tom Davenport's latest book, “Big Data @ Work” and, not surprisingly, liked it a lot. Indeed I've pretty much enjoyed everything I've read by the author. A Harvard-trained sociologist, Davenport is a methodologically-sound researcher. His deep interviews and surveys of executives and data scientists set a standard for excellence in an industry where marketing bravado generally supersedes scientific rigor. And though Davenport's writing is often not as provocative, as, say, that of Viktor Mayer-Schoeberger and Kenneth Cukier, authors of “Big Data, a Revolution That Will Transform How We live, Work, and Think”, I almost always find him on target from the practitioner's perspective. Steady, methodical, unspectacular and spot on.
Davenport the scientist serves himself and readers well when he responds to his 2011 research inquiries to overcome a personal bias that big data's little different than the analytics he's already investigated so thoroughly. “I eventually concluded, as a result of this research, that are are real differences between conventional analytics and big data, though you wouldn't always know that from reading other books and articles about the topic.” e.g. “Data Science for Business”.
"Big Data @ Work" had me nodding approval when it contrasted big data (data science) with traditional analytics from the BI world. Yes, there's the size difference and BD's preoccupation with unstructured formats we all know about. But there's also big data's focus on products in comparison to analytics' orientation of performance management. And BD supports the more challenging streaming data feeds in addition to batch. Finally, big data revolves more on bottom-up machine learning algorithms as opposed to analytics' top-down hypotheses testing.
BDW posits that data science is not really new in the business world: “today's situation (is) a dramatic acceleration in demand for data scientists, rather than the pure invention of the role.” I agree with the author's assessment that the data scientist role is a combination 1) hacker, 2) scientist, 3) quant, 4) business expert/trusted advisor.
While the ideal might be multi-role, “horizontal” practitioners with all DS skills, the reality is much more collaborative. “Some organizations try to find people with 1.5 or 2.5 of the necessary skills, and through training or experience try to build some of the rest.” Academia is more and more a contributor to building the supply of data scientists with its 12-18 month Masters programs, now progressing from an early computation-light predictive analytics focus, to the more versatile, computation-intensive data science curricula.
The final two chapters on learning from start-ups and established organizations really resonated with me, since most of OpenBI's big data work's been with companies that weren't in business 10 years ago. We can certainly confirm Davenport's stark contrast of start-up, data-only organizations to large companies with an analytics legacy.
The business models of nascent big data companies often revolve either on data and analytics as products themselves or the addition of analytics to existing transaction-based applications. For these organizations, big data is bet the business, so the action's fast and furious “agile's too slow” with no tolerance for the slothful bureaucracy sometimes seen in the large organization DW/BI world. One data start-up we worked with used free open source software until their daily processing choked, and then, in less than three months, made the transition to Hadoop and an analytic database for all their analytics processing and saw the benefits of an order and a half magnitude performance boost.
Since employees of such start-ups tend to be younger than their larger organization counterparts, and since the technology stacks of start-ups are often primarily free open source in contrast to a combination of open source and proprietary commercial for older companies, a generational analytics divide is often a consequence. The freely-available R statistical computing platform is predictive modeling lingua franca of data startups, while antediluvian SAS still prevails in large companies. Just don't ask a 25 year old Silicon Valley data scientist to write data step programs.
There's a lot to be learned from big data in large companies as well. One immediate challenge for mature organizations, in contrast with BD start-ups where big data is the business, is that big data must be integrated with traditional analytics. Davenport's interviews confirm that larger organizations are attentive to this need, managing big data and “older” analytics as one. The prudent companies combine the “new” data science skills with the “old” data integration, governance and stewardship wisdom honed over twenty years or more and generally under-appreciated by start-ups.
In the end, the Analytics 1.0 of mature organizations + the Analytics 2.0 of data start-ups = Analytics 3.0, “the data economy”, that supports both performance management and data products, that deploys a portfolio of open source and proprietary software, and that combines the can-do urgency of the young with the accumulated wisdom of the experienced.
"Big Data @ Work" is an easy, two sitting read. I enthusiastically recommend it for both business executives and data science practitioners.