The importance of data curation in information management

Register now

(Editor’s note: Jake A. Dolezal will speak on the topic of “Tying business processes into your MDM strategy” at the upcoming MDM & Data Governance Summit in Chicago, July 11-13).

As I excitedly anticipate the approach of Information Management’s MDM & Data Governance Chicago event, my mind is on our work as data leaders, thinkers and practitioners. I look forward to seeing you all there!

The current state of modern information architecture is more complex than ever. It’s tempting to trivialize it by boiling it down to two conditions: we have data and there are people and processes that need that data. Certainly, information management is essentially the practice of meeting data needs by providing useable information.

In this practice we typically characterize the needs and uses of information by:

  • The business’s priorities.
  • Awareness about the data and what can be down with it.
  • The demand for information and the systems that contain it.
  • The profile of our user community.
  • The activities in which the data will be put to use.
  • The interests or end-game of the user.
  • The preferences of users and how they like and don’t like to receive the data.

However, just like our modern data architectures, things in the data world are not that simple anymore.

First, there is the objective versus subjective paradigm. Typically, in a business setting we focus on objective uses of data under the belief that there is absolute correspondence between the data and reality. However, the case for subjective data use also is critical, because even in the age of machine learning and autonomous decision making, the human “gut” still is critical to ultimate costly or beneficial decisions, and the role that data plays in those subjective decisions should be considered.

Second, we have seen a shift from mechanistic, passive users to constructive, active users of information. Certainly, this is true in the business intelligence world as users demand more than dashboards and reports. Business users are more agile than ever, and they demand the same agility when they are “in the data.”

Third, there is the trans-situationary usage pattern where data transcends context. This is true in our master data management and data governance circles, as more connected data get used and reused across a variety of scopes and departments.

This also leads to more holistic views of experience through data rather than simply atomic views. Data truly is end-to-end in a business user’s journey, and not simply an intersection with a system or database.

I could go on, but the point is data usage today is more than collection and consumption. There’s a third C—data curation.

Wikipedia defines data curation as “a broad term used to indicate processes and activities related to the organization and integration of data.” Some might say, “We do that already. We have a data governance program, MDM, and a data dictionary.” While I applaud those efforts, I want to think of data curation from a broader perspective.

To me, data curation is more than activities, technologies and architectures. For example, data governance is an organization of people who come together to make data decisions, define data and solve data issues, but it’s not data curation. A BI semantic layer provides a layer of abstraction, but, again, it’s not curation.

I could go with a long list of activities and technologies—taxonomies, data dictionaries, retention policies, chief data officers, stewardship, master data management and so forth—all of which fall to some degree under umbrella of data curation.

Instead, I think that data curation is a strategy and a practice as it relates to data, applications, people, architecture and sources. We envision a data world where data from all sources is defined, organized, de-siloed, timely, consistent, reliable, available to all authorized users, applications, analytics and any imaginable use case, and bringing answers to business questions across the organization. I propose we, as an information management community of practitioners, develop the practice and strategies to bring it to fruition.

Data curation is nothing new. The practice to accomplish this has been under development for thousands of years—it’s called a library. Librarianship and information science has been too often forgotten as the data race has reached petabyte scale and technology can perform at astronomical scale, yet we are paying the price today with much data pain.

For reprint and licensing requests for this article, click here.