Forrester Research
for Information Management Blogs
NOV 21, 2011 9:22am ET

Blogroll

Data Scientist: Do You Truly Need Big Data?

Print
Reprints
Email

(Editor’s note: This post is one in a series by Kobielus on the role of the data scientist. Click the following titles to read his previous posts “Data Scientist: Important New Role or Trendy Job-Title Inflation?” and “Data Scientist: What Skills Does It Require?”, "Data Scientist: Is This Really Science or Just Pretension?" and "Data Scientist: Which Adjacent Roles are Central?")

Data science has historically had to content itself with mere samples. Few data scientists have had the luxury of being able amass petabytes of data on every relevant variable of every entity in the population under study.

The Big Data revolution is making that constraint a thing of the past. Think of this new paradigm as “whole-population analytics,” rather than simply the ability to pivot, drill, and crunch into larger data sets. Over time, as the world evolves toward massively parallel approaches such as Hadoop, we will be able to do true 360-degree analysis. For example, as more of the world’s population take to social networking and conduct more of their lives in public online forums, we will all have comprehensive, current, and detailed market intelligence on every demographic available as if it were a public resource. As the price of storage, processing, and bandwidth continue their inexorable decline, data scientists will be able to keep entire population of all relevant polystructured information under their algorithmic microscopes, rather than have to rely on minimal samples, subsets, or other slivers.

Clearly, the Big Data revolution is fostering a powerful new type of data science. Having more comprehensive data sets at our disposal will enable more fine-grained “Long Tail” analysis, micro-segmentation, next best action, customer experience optimization, and digital marketing applications. It is speeding answers to any business question that requires detailed, interactive, multidimensional statistical analysis; aggregation, correlation, and analysis of historical and current data; modeling & simulation, what-if analysis, and forecasting of alternative future states; and semantic exploration of unstructured data, streaming information, and multimedia.

But let’s not get carried away. Don’t succumb to the temptation to throw more data at every analytic challenge. Quite often, data scientists only need tiny, albeit representative, samples to find the most relevant patterns. Sometimes, a single crucial observation or data point is sufficient to deliver the key insight. And—more often than you may be willing to admit—all you may need is gut feel, instinct, or intuition to crack the code of some intractable problem. New data may be redundant at best, or a distraction at worst, when you’re trying to collect your thoughts.

Science is, after all, a creative process where practical imagination can make all the difference. As data scientists push deeper into Big Data territory, they need to keep from drowning in too much useless intelligence. As this dude said recently, keep your Big Data pile compact and consumable, to facilitate more agile exploration of this never-ending, ever-growing gusher.

This post originally appeared at Forrester Research.

Advertisement

Comments (1)
Before launching into "big data" we should determine if there are other ways to gather the knowledge to make informed decisions. For example "customer experience optimization" (interesting acronym "ceo") maybe better determined by asking your customers about their experience. Why collect a bunch of data, spend lots of money and resources analyzing the data to determine that your customers don't like your products because of something as basic as quality? Perhaps the CEO should walk around talking to customers to improve "ceo".

Data has become the only touch points we have with customers and this dissociation has already left customers feeling like companies don't care. We've evolved from customer service to customer relationship management to customer experience optimization. Each step has moved us further away from the personal touch. We've commoditized customers. Customers are only worth the data they provide to us.

We may have lots of data but derive little knowledge from it.

Posted by Richard O | Tuesday, November 22 2011 at 1:17PM ET
Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.

Blog Archive for James Kobielus

Big Data for the Global Grid
Big Data’s Open Source Momentum
Best Practices from Real-World Experiments
Naïve on Big Data’s Evolution?
Social Media Analytics Revolutionizing Marketing Campaign Management

More from James Kobielus »

Blog Index »

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.