Understanding the potential commercial value of unstructured data

Register now

What's the commercial value of your enterprise data? If you can answer that question accurately, then you're among the few organizations that have a real grip on their data governance.

For the rest of us, the best we can do is an estimate that, given the nature of unstructured and dark data in particular, will be inaccurate.

Behind closed doors, discussions between enterprise stakeholders about how to value data – and therefore manage financial risk – is centering on how best to discover it and track it as data continues its explosive growth, pushing manual data auditing processes into retirement.

I believe that artificial intelligence (AI) has a starring role to play in this new world where increasingly, because of the magnitude of the task of identifying and inventorying data, machine learning will begin to take care of the data estate, lowering financial risk and ultimately improving the value of the data.

Discovering hidden data

The CDO of 2019 is now strategically guiding businesses on maximizing the commercial value of data to enhance critical business decision making, which is a radical change from only recently being just a tactical role linked to just data asset gathering and governance.

A primary demand on this new role is the ability to talk to the CFO in a common language that accurately identifies the value of data, ensures it is understood, that the data governance process is watertight, and that today's blanket spending on data security can be reined in and accurately matched to the risk.

Discovering unstructured and dark data presents the biggest challenge to this objective, because data is increasingly dispersed throughout and outside of an organization, on a worldwide scale. Trying to discover manually where the data is, what it is and who has access to it is becoming more untenable every day as unstructured and dark data remains without an inventory, hidden in the network.

I think it's a vast problem because unstructured data represents about 80 percent of information across the enterprise, comprised of information such as web logs, multimedia content, email, customer service interactions, sales automation data - the list seems endless - representing financial information, customer data and intellectual property.

Because it's unstructured, IT lacks the precise knowledge of whether that data is valuable or not, and therefore where and how it should be stored, what level of protection it deserves, and how much should be spent in the process.

Budget versus commercial value

It's a big challenge to allocate data management and security spending appropriately. Most of the budget for managing and controlling data goes towards the well-understood 20 percent central database technology, rather than on trying to understand any of the outlying 80 percent.

Currently, enterprise documents tend to be managed by keyword, not value. So to use a retail metaphor, the labeling on the package doesn't necessarily represent the value of the product inside, and in effect, the warehouse and several other distribution locations are full of storage containers without any labels at all.

How long would it take to take stock of our imaginary retail operation? We undertook an exercise recently to see if manually auditing all of this data was a realistic proposition and found that it would take the average medium-to-large organization 400 man years to complete the task – allowing for 5 minutes per document to find it, open it, scan the contents and then inventory it properly while cross-referencing its business use in terms of strategic value.

Competing perspectives

Not only is today's manual process of data inventory untenable, but the lack of visibility generates discrepancies between enterprise groups that show a gap between how IT values the data versus other enterprise groups.

This worries me a lot, because gaining a common understanding of the value of data per department – and not just the IT department's valuation of it - is fundamental to a cohesive strategy to take care of it and protect it.

At the start of the year we published results of research we undertook with the Ponemon Institute to take the pulse of about 2,500 global enterprises and measure the width of disparity there.

In a very typical example from the report, the IT Security department at one firm significantly underestimated the cost of financial report leakage compared to the commercial value that Accounting and Finance put on this information asset: $131,570 versus $303,182; IT undervalued it by more than half.

In fact, this particular instance was borne out in concept across the board by the amount by which IT Security departments typically value business data – not because they're unable to do that, but because there's no clear way sometimes of understanding what's in the data and therefore why it's valuable.

At the end of the day, this dynamic presents a challenge to clear decision-making on how much to invest in protecting the data that has the highest commercial business value. Then, there's the question of spending money on protecting old, stale data when it could be used to secure and prevent the leakage of the most sensitive data assets.

Introducing the intelligence

All of this is exactly the sort of problem that artificial intelligence is cut out to resolve. Enterprises are all at different stages in their AI journey, but I think there's an extremely strong case to be made for introducing it into the early data discovery and governance phases, not least because it acts as a force multiplier for activities that are a natural next phase of that process, like cybersecurity and data compliance.

If we're able to introduce AI early enough into our data lifecycle – at the point where we're discovering what it is, where it is, and who has access to it – then we're laying the groundwork to ensure the rest of the lifecycle derives as much value as it can from the visibility into that data.

For example, there's little point in introducing a new cyber security strategy when it's not even clear where the data is that we're trying to protect, or what it's worth. But using AI to quickly and efficiently identify and create an inventory of all business data with precision means enterprises could now accurately identify both the risk and value of data, and therefore automate its ongoing classification, protection and retention while improving accessibility and quality.

At a point in time where the industry has been accused of over-hyping AI, the preparatory steps to real data governance is a particularly good example of something that has real, tangible commercial value attached to it.

For reprint and licensing requests for this article, click here.