Organizations over the years have allocated their resources towards building out monolithic applications such as CRM, supply chain, accounting, web sites, HR/workforce. More recently, the trend has evolved to building out lighter, more fit-to-purpose applications like mobile apps, shifting to SaaS offerings, or acquiring data through the course of a company acquisition.
Examples abound. One type is the manufacturing conglomerate with corporate HQ’s around the globe; shifted from disparate CRM systems to a single hosted offering. Maybe it’s the large pharma that spends millions on buying data from syndicated data providers every year. Or a global financial company that has purchased five new companies in the past two years and is working furiously to integrate that customer data internally
Applications, small or large, homegrown or acquired, on-premise or hosted by nature, create data. We know that the rate of data growth is increasing; what we don’t always stop to consider is that over time this data can grow to be an asset, or a liability, or both.
Data as a Pure Asset
Sure -- Hadoop has solved the “where do I put it all?” problem to some degree, and careful attention to cataloging and governance ensures that we might actually be able to find a piece of data if we need it. However, as companies are shifting their gaze away from the data warehouse as the only source of truth, the data continues to grow as more ancillary sources of data are introduced. This comes to us in the form of web / server logs, REST APIs, and all of the analytics we place on top of the data we’ve already acquired.
The business, compliance, and auditing functions have long held that data is an asset, and that it should be held on to in its purest form just in case it’s needed. It is an asset in the sense that it might be usable somehow, some day.
Data as a Threat
Treating data like it always has potential for good makes sense – today’s coal might be tomorrow’s diamonds. The security, privacy, and financial interests see it differently. Data in a vacuum is a threat. It is full of what might be possibly personal and sensitive information.
Someone really smart, with a lot of time on their hands, and the means to sneak past firewalls to see this data could savage a company’s reputation overnight. Worse yet, all this data has to be stored and archived somewhere and that carries with it intrinsic costs. Data is not your friend, they say – it is a liability simply because it exists and nefarious agents are always scanning ports and IP addresses looking to steal it. Data Crossroads
We’ve come to the crossroads of the data-driven and the data-concerned information cultures. What evolved from the original business intelligence systems that helped democratize data has run into the epoch of data breaches and privacy concerns. Where data-driven cultures see their data as assets that can be molded into new analytics and insights, other forces are pushing back, tempering this enthusiasm by being vigilant and risk averse. These functions view corporate data as a potential liability.
The “data is always good” and “the data might destroy us” forces share a common need for data governance, and when viewed from the lens of this function, whether data is a liability or an asset gets blurry. The reason being, like Schrödinger's cat, data can be both alive and dead at the same time. It can be both an asset and a liability at once, and data governance espouses that a dedicated function must exist to assign ownership and responsibility to the data at all times. The Good, The Bad and The Smart
Organizations that can embrace the paradox that their data can be used for both bad and good have certain characteristics. First, they know when to keep, distill, or drop data. In the masses of streaming logs, keeping the data points that matter and chucking the noise can turn terabytes of messy, hard-to-read data into megabytes of pure insight.
For example, a web log entry might be converted from 1000 characters to a fraction of that by just parsing out the important attributes – URI, referrer, IP address, and so forth. Once the key data points are isolated, the rest of the web logs information can eventually be discarded.
Second, they assume that all data is a liability first, then they work to mask and isolate it so that it’s value can be unlocked only by the good guys. Once the data is understood, owned, and fenced off, analysts are given the freedom and flexibility to work with it and wring value from it.
Sometimes this involves completely altering the data from its original form, hashing values, or using dictionary replacements. The means exist to mask and obfuscate data to the point that no amount of reverse engineering can reveal a piece of personally identifiable information.
Finally, good organizational housekeeping means more than just cleaning up – it means separating the linens and keeping an eye on combinable data sets. Analysts’ sandboxes can collect clutter over time, but they can also be a place where information stores are combined, resulting in derived PII. Keeping track of potentially revealing data points, combined with knowing who has access to these data sets, is a first step in making sure that relatively inert tables don’t combust when combined with other tables from a different source.
Outside the Vacuum
Data in a vacuum isn’t a liability or an asset. Whether it’s good or bad is determined by the people who have access to it. And this is where governance comes in - understanding the data and making sure that it stays a living, breathing thing that we control. Because its impact will only increase as data continues to grow at a phenomenal rate and we search to make it work for us. Tamed by data governance and done right, this entity can then serve many masters in an organization, shrinking risk and evolving as an asset and solution to challenges that we face now and those yet undiscovered.
(About the author: Chael Christopher is senior principal and practice lead, business intelligence for NewVantage Partners, a provider of data management and analytics-driven strategic consulting services to Fortune 1000 firms, and an industry leader in big data strategy consulting, thought-leadership, execution, and business value realization.)