Freaky Data Science
I've received quite a few comments and emails about my most recent blog, "What is Data Science Again?" Much of the reaction has been positive, agreeing with the take that the discipline revolves on the combination of computation and statistical science. Some is negative, decrying “data science” as nothing more than a trumped up neologism for “programmer”.
The perspective that most gets me thinking, though, is the one that seeks to define data science not only by the tasks of practitioners, as I've done, but also by its higher purpose something akin to "the scientific study of the creation, validation and transformation of data to create meaning".
I'm certainly in agreement with this need to view data science through the lenses of both “what” and “how”. The what provides the conceptual foundation; the how articulates the tools and skills needed to progress toward that foundation to “create meaning”.
Alas bridging the what-how gap is not so straightforward. I do believe, though, the scientific method, which logically sits between data and science, plays a key role. Methodology connects the how of DS with the what, providing linkages to both generate and test hypotheses. Experimental and quasi-experimental designs, missing data assessment, treatment effects, causal analysis, et al are central for operationalizing what to how.
Last Fall I had the opportunity to interview two students completing their doctoral dissertations in biological fields for positions with Inquidia. Both did a great job connecting the dots between what they'd been doing in grad school and work Inquidia does for our customers. One confided that he'd spent two of his six PhD years learning enough of the chosen discipline to pass qualifying exams the domain expertise and the other four learning the methodology, data analysis, statistical methods, and programming to conduct the research and complete his thesis. He reckoned he'd fit right in with Inquidia via his ability to think like a scientist. I agreed, feeling methodological savvy initially gives advanced degree science students a leg up in the DS world provided they're computationally sound. Fortunately, there are programs today to quickly bridge the computation gap between academia and data science.
Methodologies driving the ability to think like data scientists also came to mind as I read "Think Like a Freak," by "Freakonomics" and "SuperFreakonomics" authors Steven Levitt and Stephen Dubner last weekend. The engaging and wildly successful trilogy “succeeds at analyzing sociological developments in a way that is entertaining because Steven Levitt, an economist who strays from convention, has a knack for unpeeling layers and layers of assumptions and myth and showing the real causes behind trends.” Author Malcolm Gladwell's a particular fan of the latest tome, opining “Think Like a Freak is about the attitude we need to take towards the tricks and the problems that the world throws at us. Dubner and Levitt have a set of prescriptions about what that attitude comes down to.” In short, the authors are scientific methodologists par excellence.
The point of departure of Levitt and Dubner's freak approach is an obsession with “data, rather than hunch or ideology, to understand how the world works”. Incentives derived from classical economics, and the subverting human biases/heuristics drawn from behavioral econ are used both to divine and test theories and put findings to work. Other methodological cornerstones include knowing what to measure and how to measure it, understanding that conventional wisdom is often wrong and that correlation does not imply causality.
Practical freak advice pertinent for data science includes focus on solving small problems, being wary of “moral” solutions to practical problems, acknowledging what you don't know “Everyone's entitled to their own opinion but not to their own facts” and being cognizant of when it's time to quit. And of course the gold standard of determining truth for freaks is the randomized experiment “The impulse to investigate can only be set free if you stop pretending to know answers that you don't.”
Perhaps no chapter has more to offer the budding data scientist than “How to Persuade People Who Don't Want to Be Persuaded”. This is where the awareness of human biases and the art of storytelling can interact for positive good. The pioneers of the “Nudge” movement well understand that “Rather than try to persuade people of the worthiness of a goal...it's more productive to trick people with subtle cues or new default settings.” Packaging freak findings as a compelling story is often the difference between success and failure. “One reason is that a story exerts a power beyond the obvious. The whole is so much greater than the sum of the parts the facts, the events, the context that a story creates a deep resonance.”
When all is said and done, thinking like a freak means becoming more methodologically sophisticated, enhancing one's ability to generate and test hypotheses, unassailably link cause and effect in the social and scientific worlds, and persuade the public to accept conclusions they might well be inclined to reject. This same methodological rigor bridges the what-how gap of DS and answers the critics who believe data science is little more than the latest moniker for overly paid programmers.