The proverbial cobbler feverishly churns out shoes for his customers while his own children roam the streets with bare feet. Similarly, IT departments throughout the world are exerting enormous effort implementing big data projects for their line of business (LOB) partners, all the while a killer use case for the technology is hiding right under their noses.

IT infrastructures generate massive amounts of data. For example, tracking the storage access pattern of a virtual machine would require hundreds of megabytes per day. Hence, adding this level of visibility to a typical virtualized data center will produce terabytes of data each week. Sorting through this mound of information in a timely manner to draw relevant conclusions would require an analytics engine that can churn through hundreds of terabytes of data with ease and novel algorithms to make sense of it all – the very definition of big data! 

The primary function of big data analytics is to enable informed decisions by turning raw data into useful knowledge. However, IT has been blocked from applying these principles to architect and manage the data center as the tools available today have significant functionality gaps.

  1. Many rely solely on third party data. It’s the classic “garbage in, garbage out” conundrum. With the utility of the output tied directly to the quality of the input, the value of the tool is marginalized if the appropriate data isn’t collected and made available.
  2. Most are reporting-centric. Notifications are presented when problems are encountered but prescriptive guidance on how to resolve the issue is rarely provided.
  3. The vast majority are reactive. Recognizing statistical anomalies based on past data is the norm whereas the preference would be to take actionable steps to avoid complications in the first place.

Take for instance, the tire pressure monitoring system (TPMS) in the automobile dashboard. An older model car might illuminate a generic indicator when it detects low tire pressure. That’s helpful to know but useful context such as which tire is affected and what should be done to get it back into working order is missing.


A more helpful display would show that 8 pounds per square inch (PSI) needs to be added to the rear driver’s-side tire.

Photos courtesy of http://www.safercar.gov

Imagine a world where, instead of just blinking lights vaguely intimating that there’s a problem somewhere in the infrastructure, we could precisely identify the virtual machines that are experiencing higher than desirable latency. Rather than poring through machine-level metrics like CPU and memory utilization in an attempt to root-cause an issue, picture just a couple of clicks leading you to intuitive recommendations.  As an alternative to taking a stab in the dark, what if you could measure the working set for each application and its associated VMs to size and allocate resources precisely?

We are on the cusp of the next evolution in the enterprise with innovation accelerators like cognitive systems and the “Internet of Things” beginning to hit the mainstream. IT is a foundational element of this movement. Implementing big data methodologies in a practical manner to empower data center operators with greater visibility and insight is the best way to hasten, not inhibit, positive change. If done well, this could be the perfect union of a burning problem (large scale IT operations management) with an elegant solution (prescriptive, descriptive, predictive analytics) addressing an inefficiency plaguing every datacenter today.  

Nick Suh is head of product marketing at PernixData, which focuses on server-side storage intelligence.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access