Several recent MIT Sloan Management Review Data & Analytics Blog articles call for restraint in anointing big data the cure-all for business ills, citing substantial biases that must first be overcome. In a delightful and highly-informative O’Reilly Strata Conference presentation, MIT Media Lab’s Kate Crawford, warns of “algorithmic illusions” with big data: “…Biases in data collection, both in how it’s prepared and cognitively; exclusions, or gaps, in data signals whereby some people are not represented by data; and the constant need for context in conclusions, whereby small data — asking people how and why, and not just how many — tells a better story than big data.” Her antidote: combine big “data together with small data — computational, social science along with traditional qualitative methods.” Sounds a lot like Gary King’s proposed merger of qualitative and quantitative research in social science.

UNC professor Zeynep Tufekci piles on the big data bias wagon in her article: Big Data: Pitfalls, Methods and Concepts for an Emergent Field. She sites biasing concerns with using Twitter or Facebook for analytics similar to those biologists find with Drosophila flies: what‘s most accessible is not necessarily what’s most representative. “Twitter is used by about 10% of the U.S. population, which is certainly far, far from a representative sample. While Facebook has a wider diffusion rate, its rates of use are structured by race, gender, class and other factors and are not representative. Using these sources as ‘big data’ model organisms raises important questions of representation and visibility as demographic or social groups may have different behavior — online and offline — and may not be fully represented or even sampled via current methods.” The non-representative bias is further exacerbated by “one-shot, single-method” data collection designs that provide “no way to assess, interpret or contextualize the findings”. In short, with big data, sample bias + design bias = big trouble.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access