Of all the flavors and types of big data, the most complex - and arguably the most valuable for most businesses - is customer data. Customers generate huge volumes of highly complex, fragmented and only marginally consistent data – literately trillions of data points every year from operational systems, web traffic, mobile applications, CRM systems and user devices.
This rise of big data, especially customer big data, has changed our language and perspective, creating a new vernacular where data scientists and data users are talking about volume, velocity, variety, Hadoop, data lakes … enough new jargon to make heads spin. As users, we are also looking for new technologies to support the operational complexity and content richness of this new data paradigm. Traditional relational databases and supporting tools, while necessary, are no longer sufficient.
In addition to the new terminologies and technologies, yet more jargon is arising to instruct businesses how to best make use of these data. Conversations on topics such as machine learning and AI driven analytics, real-time decisioning, longitudinal analysis, model factories, automated business intelligence, just-in-time data visualization, and so forth have moved out of the AI lab and become the new norm in our customer-focused (driven) workplace.
Despite all the new terminology, and the marketing collateral that says big data is “new” and “different” and “game-changing,” the reality is that those of us with long experience in the information sciences have seen much of this before and already have a conceptual framework to understand it. The DIKW pyramid is a simple yet powerful concept that is still relevant today, despite all the new buzzwords, jargon and technologies that have arisen.
The DIKW Pyramid: A Big Data Framework
The term “DIKW pyramid” refers loosely to a class of models meant to represent structural or functional relationships between Data, Information, Knowledge and Wisdom.
Labeling this idea as a pyramid is slightly disingenuous because it implies artificially hard barriers between the concepts when the lines are fuzzy at best. Data often bleeds into Information, which bleeds into Knowledge, which bleeds into Wisdom. If anything, it helps to think of DIKW as a heuristic, with multiple steps along the way and each layer blending into the others.
In this context, the “layers” are defined as:
- Data – Raw, unconnected facts, such as “rain,” “green,” and “obvious.”
- Information – Data that is given context and begins to become useful, such as “it’s raining outside,” “the grass is green,” and “this is obvious.”
- Knowledge – Information that’s used to take a specific action, such as “if it’s raining outside, I can offer a special on umbrellas.”
- Wisdom – The notion that there exists a process to apply all of this to a desired goal, typically with the idea of it being over time, such as “It rains a lot here during May, so I should make sure I have extra umbrellas in stock around this time and plan for umbrella focused customer messages.”
In a business context, the flow goes from:
- Collecting data about your customers from across the modern, chaotic IT landscape;
- Building information by adding context to disconnected data points through metadata management and analytics;
- Using the knowledge created to take action (such as running a campaign) based on those analytics, or mapping out a customer journey and applying those learnings over time; and finally,
- Developing the wisdom to close the data/action loop – gathering process outcome data and measures as inputs to refining actions based upon evolving conditions.
The DIKW pyramid is an old concept, with many sources pegging 1987 as its foundational year, but others point to economist Kenneth Boulding as defining a variation on the hierarchy as early as 1955.
Many thinkers have defined similar schema at different times, which muddies the waters even further. Although its origin is hazy, the hierarchy is still a useful way to envision the processes of capturing and utilizing information in our big data age with its multiple varieties, volumes and velocities of data capture and business consumption.
Big data is a variation on the same theme that those of us who have worked with data for years already understand. While the technology and tools are vastly more powerful, in the end, data are still data. Volume, velocity and variety create technical challenges, the solutions of which give us more power and precision than we only imagined a few years ago.
But making that data useful and, more to the point, actionable has always been the core focus of data managers and other professionals. The underlying metaphor to comprehend how data become useful and create value hasn’t shifted either. While the technologies and methodologies used and the language applied have evolved, as they must with every new innovation, the DIKW pyramid is still a useful model for the journey from raw facts to worth (and value) over time.
The words may have changed, but the song most certainly hasn’t.