The proliferation of analytics has made data a valuable commodity. According to Gartner, organizations digging for value in data will spend an estimated $18.3 billion on BI and analytics tools in 2017. Yet recent studies show that on average, less than half of an organization’s structured data is actively used in making decisions—and less than 1 percent of unstructured data is analyzed or used at all.

Big Data Blind Spot: Metadata

Here’s a big part of the problem: despite the hefty spend, many organizations don't have effective strategies and tools for leveraging metadata—the invisible data that defines data. That's a mistake.

Metadata is not new. But it has become more important as data volumes and varieties have grown. Metadata forms the “keys to the kingdom” of data: it tells users what data exists, what it means, where it comes from and whether it’s accurate. And metadata defines the “lingua franca” for interoperation, enabling the increasingly disparate world of data sources and tools to understand how and where to access data and work together.

In recent years, changes in best practices and the software platforms to support them bring higher risks and rewards. Given the emergence of data lakes, self-service analytics, open source and cloud platforms, metadata becomes even more necessary to handle the increase in variety and volume of data sets. And as the value of metadata grows, there’s a sensible worry in the air that proprietary metadata solutions could lock up the keys to the data that is so important to a well governed analytics organization.

Full-Contact Metadata: Offense and Defense

Since the beginning, metadata has been good for “defense”: ensuring that data usage stays within guidelines and protects bottom-line business processes. For an overall data strategy, that means incorporating governance into policy-driven IT practices, engineered in advance of data loading and usage.

Enterprises need to know: Is critical data secure and trustworthy? Is there an “audit trail” of data lineage that shows when and how data is generated, used and changed? Is the data correct—standardized in its representation and content?

Defense is about prevention. As such, a defensive metadata approach needs to be designed and deployed in advance, providing a constrained framework for top-down control of data representations.

What’s been changing in recent years is the growing value of playing “offense” with data, bringing new perspectives to metadata strategy. The modern data-driven enterprise empowers creative analysts to generate top-line value by aggressively harnessing any data they can get. Business-driven value generation is the signature feature behind the “big data” and “data science” movements, representing a key competitive advantage (or threat!) across nearly every business sector.

Offense is about creating opportunities to score. It requires agility and flexibility on the field: each line of business in an organization should be promiscuous in finding useful data, quick to innovate with process, tools and techniques, and free to change behavior as they learn from experience and continuously improve.

Balanced Strategy Principles for Metadata Solutions

A Freakonomics study from a few years back argues that, in football at least, you need a balanced strategy of offense and defense. I agree in this context too: successful organizations need to allow both defensive metadata engineering in advance, and offensive metadata capture and curation on use.

Embracing new data practices goes hand-in-hand with embracing new platforms and tools, and the advent of open source and the cloud have accelerated those transitions. As data migrates to these new platforms and practices, there are open questions about the nature of the next generation of metadata solutions. On that front, here are a few simple principles to keep in mind in selecting metadata solutions and rolling them out for usage.

The first principle: require openness. Organizations are aggressively moving data out of platforms from legacy software vendors with lock-in strategies. It is critical that metadata follows suit. You simply can't afford to let one software vendor or service provider lock up your metadata or dictate your metadata model. Favor metadata software that adheres to vendor-neutral APIs and representations, and maintains an open-source compatibility strategy so you can always walk away with your metadata and control your own fate. This requires close integration with ecosystem standards, and partnerships with best-of-breed technology vendors.

The second principle: remove friction. If you make it complicated to start using metadata, you simply won’t get much; you’ll find yourself with an inability to play defense, and a ragged and poorly organized offense. To foster comprehensive defense, lay the foundations to make it easy to capture metadata that occurs in the wild: security principals, timestamps, text annotations, and user identities. Making these simple standard attributes easy to capture can bring business users out of the shadows, and provide a modicum of defensibility across all activities in the organization.

With that in place, get the “game films” for the offense: record any and all metadata from analytics efforts in whatever form you can get it. Don’t add requirements or expectations; simply capture the “metadata exhaust” of analysis. Some of it will turn out to be useful later, and that utility can emerge over time. Make the process uniform by gathering metadata for settings and actions in projects that succeed as well as projects that fail.

The third principle: encourage a curation culture. I frequently hear stories about business users who are “surprisingly” eager to put in time curating reference data and metadata descriptions. There’s really no surprise there: people want to organize their material because it helps them do their job better.

Encourage that impulse! Empower your embedded neat-freaks and enthusiasts to chime in and work together to improve shared metadata assets, wikipedia-style. Your metadata strategy needs to be able to scale out, not only in volume and variety but in the number of hands sharing the work, incrementally, over time.

The New Metadata Imperative

The new offensive approaches to metadata apply to scenarios of all shapes and sizes. These issues are by no means restricted to “big data,” in the sense of enormous data sets. Environments with lots of “small-and-medium-sized data” are especially in need of an open, low-friction metadata strategy to encompass the breadth of data and usage. Open, creative metadata is a universal imperative.

A successful metadata strategy in today’s complex world is all about embracing imperfection, while enabling and incentivizing continued progress in metadata and data quality. This requires thinking through both organizational issues and software solutions. For organizations with strong traditions in defensive metadata, embracing the imperfections of offensive metadata can be a culture shift. Organizational success will depend on recognizing that tension and accommodating it.

In some organizations, the right short-term solution may be to isolate the defensive team and their tools from the offense and its tools, until the day when both are ready to merge. That in itself is a form of organizational agility. In other organizations, a full-field approach is the best decision right away.

Keep in mind that the difference between a good offense and a great one is teamwork. To that end, think about processes and tools that amplify individual efforts to provide collective benefit. This works at multiple scales. Small community contributions should be made simple—for example, allowing wiki-style editing of shared reference data, with change tracking for governance. Big efforts should foster broad rewards: if one team invests deeply in extracting value from a dataset, other teams should learn about it and reap the benefits.

In any complex business, metadata strategy is not going to be a one-shot decision—it will need to evolve over time. You’ll know you have it right when it is evolving at the pace that your business needs to innovate.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access

Joe Hellerstein

Joe Hellerstein

Joe Hellerstein is the co-founder and chief strategy officer at Trifacta, and the Jim Gray chair of computer science at the University of California at Berkeley.