It’s hard to tune into any technologically oriented media outlet these days without hearing about big data. The general premise being that there’s “gold in them thar hills” -- the hills being mountains of unstructured, human-generated data.
While it’s true that big data projects have the potential to harness exabytes of information, all that glitters is not gold. There are certainly nuggets of helpful information to be mined within enterprises, but that same mountain of data likely contains just as much risky information as it does valuable content.
It is this risk element that is frequently getting short shrift, due to the notion that “more is better.” But with recent surveys pointing to as little as one percent of all corporate information being mined for big data purposes, the vast majority of unstructured data may have very little analytical utility at all.
This stark reality wouldn’t be so concerning for organizations if it weren’t for the following factors:
Information is exploding at an unprecedented rate. As an example, between the birth of the world and 2003, humanity created a total of 5 exabytes of information. As a society, we now create 5 exabytes of data every two days.
Conversely, and perversely, the value of information is decreasing in relation to how quickly new data is generated. Said another way, the business utility of information used to be measured in months, but now, with so much new data being produced every second, the useful half-life of information is typically measured in days, moving quickly to hours in the near future.
Despite the pervasive notion that “storage is cheap,” enterprise storage is far from inexpensive. Yes, you can buy a 2 terabyte USB drive for less than $100, but that’s not enterprise-class storage, which has to be provisioned, backed up, secured and monitored. All these activities come at a significant cost, which is particularly hard to rationalize if the storage is utilized for irrelevant or obsolete data with no business value.
Simply having corporate data lying around poses significant risks due to regulatory investigations, litigation and data breaches, among other reasons. In 2012, for example, there were a record number of breaches (1,611), a massive 48 percent increase from 2011. And the collateral damage is staggering; a report from Javelin Strategy and Research concludes that a single massive data breach can result in “billions of dollars” in consumer fraud losses.
If an organization’s data governance is fragmented the above risks are multiplied. Corporate legal, IT, risk, compliance and information security functions often operate as silos, precluding holistic policies, procedures or enabling technologies. Each group views what is essentially the same problem through a different lens, so that when there is a security breach, data loss or a regulatory inquiry, everyone has to scramble to address the situation separately. This perpetually keeps companies in reactive, fire-fighting mode.
Important trends like cloud storage and the bring-your-own-device movement are increasing the complexity of this fractured reality. If an organization decides to put data in the cloud for a subset of users, for example, what happens if an institutional legal hold must be placed on that cloud data? Such a problem bleeds out of one group’s silo into all of the others.
What’s more, even the most rigorous information security protocols can’t prevent an individual within the organization from breaching them. A health care company may use data archiving and other enabling technologies to achieve strong on-premise security, only to run the risk of HIPAA violations, amongst others, when one of its practitioners creates a Dropbox and places personally identifiable health records into the cloud.
In terms of information governance, these scenarios demonstrate the consequences of the 3 V’s: As data grows in volume, increases in variety and moves with greater velocity, the capabilities required to govern it must also increase exponentially.
Information governance is defined as “a cross-departmental framework consisting of the policies, procedures and technologies designed to optimize the value of information while simultaneously managing the risks and controlling the associated costs, which requires the coordination of eDiscovery, records management, and privacy/security disciplines.” By leveraging this type of umbrella framework, organizations are able to gain efficiencies and facilitate the coordination of information between the various departments. This reduces risks and costs. But to achieve better information governance, businesses should start with smaller initiatives and modular projects for example, migrating data off a legacy system which they can use as discrete, controlled test cases. Such a unified information governance effort can also serve as the impetus for different departments to collaborate substantively, which has its own benefits for an enterprise.
Managing the risks associated with information is not new. Historically, companies have asked custodians to preserve business information by using record management systems to store important content. As logical as that approach was, it just did not work in practice.
Even when individuals have the time to sort data, humans are simply not very proficient at classifying information, and corporate data custodians are not good at governing their own information. This is precisely where machine learning technology can step in and help systems function with minimal human supervision.
Novelist William Gibson once said, “The future is already here it’s just not evenly distributed.” This is certainly true with information governance. Multinational organizations and heavily regulated entities are already embracing this discipline because the risks and costs are simply too enormous to ignore. While other enterprises may not see the future so clearly, the 3 V’s of big data make it a fait accompli that more effective information governance is the order of the day.
Now is the time to act, before information overload truly sets in. The good news is that the initial framework for an information governance program can be put in place simply by identifying low-hanging fruit. Once initial successes have been had, businesses can begin attacking the information deluge in earnest. As the saying goes, “sooner started, sooner done.”