Unlike other “Getting Control of Big Data” authors, Barton and Court bundle big data and analytics. The sub-heading of their “How to Benefit from Big Data” graphic is “To improve performance with advanced analytics, companies need to develop strengths in three areas.” The three areas: 1) an IT infrastructure for managing big data, 2) analytic models that balance sophistication and simplicity, and 3) a transformed organization that has processes and capabilities in place to deploy the analytical tools. The authors note “Two important features underpin those activities: a clear strategy for how to use data and analytics to compete, and deployment of the right technology architecture and capabilities.” And for Barton and Court, delivering complex models through a simple-to-use-tool is key.
My company, OpenBI, offers a short Big Data/Analytics Architecture Planning and Roadmap engagement to our customers that looks suspiciously like B&C’s Infrastructure/Models/Organization framework, even though it was developed independently. Project deliverables include inventory/requirements for data and analytics, a technical architecture to satisfy those requirements, and a scoped phased implementation plan.
Davenport and Patil’s depiction of data science is perhaps the most comprehensive and lucid I’ve encountered. Their starting point is that data science is still in its infancy, with “no university programs offering degrees in data science. There is also little consensus on where the role fits in an organization, how scientists can add the most value, and how their performance can be measured.”
The authors see the ideal data scientist as a combination data munger, quant, communicator and advisor. At this point the most important DS skill remains programming, though they expect communication and verbal/visual story-telling skills to be prominent five years out.
The science in data science revolves on hypotheses formulation and testing. That’s why “companies looking for people who can work with complex data have had good luck recruiting among those with educational and work backgrounds in the physical and social sciences.”
Like social or physical scientists, data scientists are energized by the abundance of data in their midst, at the same time realizing “they face technical limitations, but they don’t allow that to bog down their search for novel solutions … They identify rich data sources, join them with other, join them with other, potentially incomplete data sources, and clean the resulting data set.” They also wish to be more than consultants – to be “on the bridge” – “in the thick of a developing situation, with real-time awareness … creating solutions that work … and leave their mark as pioneers of their profession.”
Importantly, the authors contrast data scientists with data management and quantitative BI practitioners of the past 15 years, noting, “a quantitative analyst can be great at analyzing data but not of subduing a mass of unstructured data … (while) a data management expert might be great at generating and organizing data in structured form … (but) not at actually analyzing the data.” The data scientist must be equally facile with both data and analytics. That’s why the top MS in Analytics program at North Carolina State is “busy adding big data exercises and coursework.”
Davenport and Patil see the trajectory of data science as similar to that of the Wall Street “quant” movement of the last 25 years, in which physics and math Ph.D.’s “streamed to investment banks and hedge funds, where they could devise entirely new algorithms and data strategies.” The quant explosion in turn gave rise to new master’s programs in financial math and engineering. Perhaps analytics programs like the one at NC State, with an expanded emphasis on big data, as well as new initiatives at elite schools like Columbia, will do the same for data science.
In the meantime, companies wishing to develop a data science capability should search for candidates on LinkedIn, recruit from top universities, befriend R and Hadoop user groups, participate in conferences like Strata + Hadoop World, and seek go-getters who’ve successfully completed programming, data and machine learning Coursera curricula. The abilities to program and learn new technologies are keys, as is a business focus and the facility to tell stories from data.
At the end of this article and the entire big data spotlight, chief Google economist and former Berkeley professor Hal Varian’s quote, “The sexy job in the next 10 years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1900’s?” seems more pertinent than ever.