Opinion Why so many data lakes turn into swamps

Published
  • June 07 2017, 6:30am EDT

With data becoming the new currency of the digital era, many organizations are tempted by the idea of building a future pot of gold. While plenty of them feel inclined toward the notion of “storing first, and asking questions later,” few organizations have thus far succeeded in monetizing their efforts.

With data storage prices strongly on the decline, and thanks to emerging digital technology, building large-scale data silos has become cheaper and easier than ever before. As evidence, PwC recently came up with a catchy and eye-opening illustration when comparing storage prices over time. In 1964, a terabyte of storage would have cost US $3.5 billion, compared to a mere US $27 in 2016.

This is roughly 130 million times cheaper than 52 years ago, and equals a 30 percent compound decline year-on-year. The phenomenon as such has been often referred to as Moore’s law.

Data Strategies Are Focused On Volume Rather Than Value

Unfortunately, few are aware of the true composition of their digital assets. At the average enterprise, 41 percent of the data is stale, meaning that it has been untouched over 3 years. Another 12 percent of data is ancient, and unmodified over 7 years. Another 5 percent of data is orphaned, whose owner is unknown. Still, too often data strategies are focused on data volume rather than data value.

However, the pure size of the data silo is not a very reliable indicator in terms of the achievable economic outcome. The only thing that is pretty certain is the cost for managing this swamp.

Moreover, the perception of “free” cloud storage is seductive for businesses and individuals alike to offload data rapidly into the cloud. Aside from the fact that there is no such thing as a free lunch, data unfolds enormous gravity – making it painful and complex to move it back, to say the very least. The bigger the estate; the bigger the challenge.

The Digital Universe Grows, and So Does The Digital Cemetery

The digital cemetery is expanding quickly and organizations keep spending a fortune on the care of graves. The typical enterprise spends US $20.5 million on storing stale data, US $6 million on ancient data, and US $2.6 million on orphan data. With 15 percent, on average, only a tiny fraction of all data stored is actually considered to be business critical.

However, there are significant regional differences. While the amount of dark data is remarkably high in countries such as Germany (66 percent), Canada (64 percent), and Australia (62 percent), the United States operates in the center field with 52 percent on average. Other countries are more advanced in managing their digital assets. The highest proportion of clean and identified business critical data was found in China (25 percent), Israel (24 percent) and Brazil (22 percent).

End-Users Are Non-Compliant

Due to the influence of shadow IT and practices such as bring-your-own-device (BYOD), the demarcation lines between corporate data and personal data are blurring.

Veritas discovered that 65 percent of employees use non-sanctioned sync and share services. Personal data was stored on corporate devices by 27 percent of the workforce. Another 20 percent used personal devices to carry business information.

This not only has numerous implications from a security perspective, but enterprises also end up occupying ample resources for petabytes that are useless from a company standpoint.

Summary

Organizations have a big appetite to launch data-centric business models at some point in time in the future, but only a few have so far been able to successfully turn vision into reality. In other words, there is a lot of mindshare, but little share of wallet.

Organizations relying entirely on the mantra of “storing first” run the risk of hoarding useless data on autopilot without even recognizing it, and wasting tons of money without being able to capitalize on these efforts. It is important to understand that, apart from selected use cases, the value of historic data is often fading.

Despite the steady price decline, without putting a solid data governance model in place, organizations run the risk of bearing significant costs for managing large amounts of highly questionable data while their estate expands quicker (+39% p.a.) than the costs decrease (-30% p.a.).

Governance models are a paramount success factor. Those organizations that have successfully implemented them tend to achieve far better results. Veritas concludes that, on average, those erasing useless data are 39 percent more effective than the competition.

Actively controlling the growth of enterprise data makes a company 36 percent more effective, while those making conscious decisions regarding information lifecycle achieve 34 percent more effectiveness than their rivals.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access