4 key elements to successful data governance

Register now

As the data sphere expands, Information Governance will enable digital-savvy organizations to gain productivity and minimize data retention costs.

Data fuels many of the 21st century’s fastest-growing businesses. When employed in a smart fashion, it allows organizations make informed decisions, address customer requirements, discover efficiencies, capitalize on new opportunities, and rethink business models to achieve greater outcomes.

Nearly 90 percent of respondents in a survey conducted by law firm Freshfields Bruckhaus Deringer indicated that “access to data is critical to being competitive in their industry.” Quite similarly, 58 percent in a Capgemini study “expect to face increased competition from start-ups enabled by data.” Another 61 percent say that “big data is now a driver of revenues in its own right and is becoming as valuable to their businesses as their existing products and services.”

The key challenge, however, is the explosion of data—growing at nearly 40 percent per annum and doubling in size almost every two years. IDC estimates that by 2020, the digital universe will reach 44 zettabytes (or 44 trillion gigabytes), a tenfold increase over 2013. This data is being generated by close to 21 billion connected devices that are broadcasting around the globe, according to projections from Gartner. In other words, by 2020, there will be 3-times more smart devices than humans on the globe.

Yet having all the data in the world is only of little help without being able to swiftly access the right piece of information when needed. To unlock the value of data, it has to be quickly and easily accessible. With the misconception in mind that bigger data silos would automatically lead to greater returns, some companies tend to be storing all of it.

The reality, however, is that most of their data is dark and holds no value, which creates immense administrative costs going forward.

More data often equals less productivity

As highlighted in Veritas’ Fighting Back Against the Exponential Data Curve report (registration required), an Adobe online survey discovered that office workers spend 6.3 hours a day checking e-mails. Assuming the average person receives around 50 e-mails a day, this corresponds to a staggering 18,200 e-mails in a single year.

Moving up the ladder, the problem only seems to worsen. Executives, on average, receive 30,000 external communications per annum and spend about a day each week managing their data flood, as reported by Bain & Company. Vast amounts of this data are unstructured, so it doesn’t follow a pre-defined data model, which means it can’t easily be organized or searched.

From an organizational perspective, a lot of time and money is lost by keeping people busy with searching for information, screening attachments and administering the data tsunami. Concisely, the larger the organization and the more interactions, the bigger the productivity impact.

Four core capabilities to reduce data discovery and retention efforts

To gain productivity digital-savvy enterprises implement an information governance framework for managing information throughout its lifecycle to support the organization’s strategy, operations, regulatory, legal, risk, and environmental requirements. This not only boosts the productivity of individual users by making it easier to find information instantly, but also lowers the retention costs by filtering out data sets that are no longer needed.

Information governance comprises a combination of people, policies, processes, metrics and tools, to help extracting value from information and mitigating risks. The four key capabilities typically include:

Classification and Tagging

Classification and tagging are pivotal to determining the data’s value, optimizing search results, and accelerating data gathering. Furthermore, scrutinizing metadata for additional context can improve the ability to discover data quickly and easily. Effective classification tools combine content and context in an effort to make data sets more tangible. Organizations can incorporate classification tags into policies to fully automate their processes.

An archiving policy, for instance, can automatically assign a retention period to meet regulatory requirements for data containing personally identifiable information (PII) such as addresses, health records, credit card, passport or phone numbers, and so on. Since policies or regulations might change over time, it’s pivotal that re-classification of the data is easily possible should the need arise.

Policy Enforcement

Given the unrelenting expansion of the data sphere, organizations need a more systematic approach for automating the collection, retention and expiration of content. Companies can create a “single point of truth”, by deploying a policy framework that drives consistent policy definition and enforcement across the organization. An archiving process, which indexes all enterprise data and process it into a single repository, helps accomplishing that. Policies also help ensuring that data will be stored, processed and achieved securely and confidentially; and recorded accurately, reliably and lawfully—taking into account ethical standards as well as the regulatory frameworks that may apply.


Both the sheer volume of data and the number of repositories where that data resides can be overwhelming. Without a coherent approach for building a data map and tracking data access, people could spend endless efforts attempting to find a needle in a haystack. Analytics allows tracing which employees have created, accessed, and modified data, to help eliminate non-relevant targets from discovery efforts.

For example, in Europe, Data Protection regulations enable anyone to solicit organizations to provide and potentially erase all the information they have about them; in the United States, the Freedom of Information Act allows individuals to submit a request for data concerning government agency investigations. Analytics allows tracking the scope of activity around an individual user or piece of data. It also allows gives assurances that regulated data has stayed within a certain jurisdiction to meet compliance requirements.

Contextual Search and Machine-Learning

Manually reviewing shiploads of potentially relevant content can be enormously time-consuming, painful and expensive. Nowadays, with the advent of contextual search and machine-learning algorithms uncovering e-mail conversations and secret code names that are utilized to hide questionable or even illicit activities has become so much easier.

After setting search criteria, auto-filtering can help automatically eliminating irrelevant content and speeding up discovery efforts. This information not only supports forensic investigations but also facilitates data cleanup.

Ultimately, the ability to quickly do discussion threading and conduct advanced keyword searches, allows zooming-in on relevant content. This not just leads to productivity improvements but also helps reduce costs. It can also be used to identify abnormal activity to safeguard sensitive and regulated data.

For reprint and licensing requests for this article, click here.