For me though, none of the new thinking has supplanted the seminal “What is Data Science?” article by Mike Loukides as the DS manifesto. The first and most important differentiator of data science for Loukides: enabling the “creation of data products.”
I have more than a few disagreements with what's written in the report, “Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field,” accompanying the EMC Data Science Community Survey, but distinguishing BI as obsessed with performance management while data science focuses on products isn't one of them. “It may be helpful to think of data science and business intelligence as being on two ends of the same spectrum, with business intelligence focused on managing and reporting existing business data in order to monitor or manage various concerns within the enterprise. In contrast, data science applies advanced analytical tools and algorithms to generate predictive insights and new product innovations that are a direct result of the data.”
That BI's primarily concerned with PM while data science fixates on sexier data products, however, in no way implies that BI has a less lofty calling than DS, as some “experts” suggest. I reject the contention, for example, that BI is consigned to answering mundane questions of “what and where”, while data science is concerned with the more strategic “how and why.” Many evidenced-based BI'ers are just as focused on how and why as what and where.
I also reject the oft-cited distinction that data scientists deploy the latest statistical algorithms while BI digs no deeper than guided reports and OLAP. Many of the large BI shops I've worked with over the years employ groups of PhD statisticians to mine the data warehouse. Indeed, I've used plenty of ARIMA and statistical learning models in my BI career to support demand and propensity forecasting. And some of the best illustrations of data science I've seen are statistically unsophisticated mashups that tell compelling stories visually – and simply. It's not high-powered statistics that separates BI and DS.
Others contend that data science distinguishes from traditional BI in its strict adherence to the scientific method that:
- Observes a phenomenon or group of phenomena;
- Formulates a hypothesis to explain the phenomena. The hypothesis takes the form of a causal relationship or a mathematical function (the more of X, the less of Y);
- Uses the hypothesis to predict the results of new observations; and
- Performs tests (experiments) with the predictions.
Certainly, the scientific method is core to the work of data scientists. But it's also central to BI practitioners who focus on managing business performance through a “science of business” or “evidenced-based management” lens. The OpenBI BI requirements-gathering methodology revolves on the science of business, delineating hypotheses and causal linkages among leading and lagging indicators, and deploying randomized experiments where feasible. Is that BI or DS?
Those who think BI maintains a “lesser” analytical positioning than DS should check out the web site of statistics leader SAS. SAS moved “downstream” to traditional BI and business analytics from core statistical analysis many years ago to broaden its market appeal. And though SAS still makes most of its money from legacy statistical modules, the site landing page greets prospects primarily with BI topics such as Analytics, Business Analytics, Customer Intelligence, Business Intelligence, Data Management, Performance Management, etc. al. Apparently, SAS now positions itself as the industry leader in all areas of BI and analytics, able to provide one stop shopping for “competing on analytics” buyers. Is SAS a BI or DS platform – or both? Maybe they're just plain analytics.
Forbes.com writer Dan Woods has conducted a terrific series of interviews on data science with leading industry practitioners and vendors. Tableau founder Pat Hanrahan has a BI view of DS: “At the most basic level, you are a data scientist if you have the analytical skills and the tools to ‘get’ data, manipulate it and make decisions with it,” he says. Hanrahan's more data populist than high priest, noting “Realistically, most people in an organization who will need to work with data are not going to be PhDs in statistics. The data enthusiasts also need to be enabled.”
He cites eBay as the archetype enthusiast culture. “At eBay, the company pushes “self-service analytics,” whereby IT creates the plumbing for queries and users can ask questions and get answers on their own. eBay employs a ‘virtual data mart,’ wherein any user or group in the company can set up a data mart, request access to the data, and within a day can begin diving in.” Tableau fans shouldn't be surprised at this positioning. Launch Tableau Desktop and you're greeted with “Fast analytics and rapid-fire business intelligence.” Is Tableau a BI or DS tool?
Woods' conversation with LinkedIn's data scientist Monica Rogati, on the other hand, corroborates Loukides' depiction of data science in the service of products. She notes that “On one side, I’ve been working on building products, like the recommender system, Talent Match, modeling and finding ways to empower users to use LinkedIn through their products. Groups You May Like was another product I started. The other side is finding interesting stories in the data. It’s exciting to be able to tell stories collected from the careers of 120 million professionals, and trying to learn what that data can tell us about the world at large.”
Rogati's LinkedIn colleague Daniel Tunkelang notes that data scientists create value in three ways:
- The first is by performing offline analysis that informs mission-critical business decisions, e.g., identifying key user segments or activities.
- The second is by improving products such as search and recommendations that rely on the quality of data and derived data.
- The third is by creating data products: for example, LinkedIn Skills shows you the top locations, related companies, relevant jobs, and groups where you can interact with like-minded professionals.
Seems like DS with a smattering of BI to me.
Rogati sees the data science discipline as “half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo – starry eyed explorers and skeptical detectives.” She adds “some (data scientists) are more technical, others are more creative and communicative. A data scientist has to have both.....Part of the duty of a Data Scientist is to promote the data-driven culture. The way to do that is by exposing the data and making it relevant to everyone in the company, and showing them what you can do with it.” This sounds to me suspiciously like the business/IT labor dualism in BI.
Regardless of where you sit on the specific roles of BI and data science, I think it's safe to say there are elements of business, technology and statistical science, along with an encompassing scientific inquisitiveness, in each. Next time I'll explore my evolving perspective on business intelligence and data science.
(Editor's Note: Click on the title to read “Data Science or BI? – Part 1.")