Monitoring is the stepchild of IT. It’s the hardest thing to do, and it receives the least attention. This is because the business of monitoring used to be much simpler. A company could implement a monitoring tool, which was likely free, if cumbersome, to maintain. Before clouds, containers and microservices, architectures were simple and companies weren’t as dependent on technology, so downtime wasn’t as costly.
Now, however, applications are complicated, and performance and availability are business-essential. But, while applications have changed, perceptions of the value of monitoring haven’t. It falls to the next generation of systems management tools to combine the low-touch approach from the past with the ability to monitor complex systems at scale to best meet the needs of today’s technology-dependent organizations.
Monitoring—the stepchild of IT
So many of IT’s responsibilities are higher-profile than monitoring—for example, customer support and the provisioning and management of key services that IT provides, such as email and Wi-Fi. IT is expected to be as reliable as household utilities—you simply flip a switch or turn on the faucet, and you have light and water, just like that.
The problem is that delivering those services assumes IT knows and is able to correct problems when they occur. Too often the value and complexity of monitoring are taken for granted. To perform as anticipated—in other words, flawlessly—monitoring of services must be as important as delivery of services.
The current monitoring mind set in many enterprises is highly flawed in that problems are often identified by customers—the equivalent of turning on the water and having nothing come out. The Holy Grail is to prevent issues from even occurring, of course. But since that is an unrealistic expectation for now, organizations must monitor proactively so that an issue can be rectified quickly.
But from the CIO on down, monitoring is seen as being easier than it is, and, therefore, not as valuable. Yet, it’s becoming increasingly clear that monitoring is more important than the amount of resources committed to it.
The disconnect between app complexity and monitoring’s perceived value
With the complexity of today’s apps, it is becoming harder and harder to maintain the infrastructure required to deliver them. The combination of these realities—the increasing importance and complexity of apps and the resulting requirement for more focus on monitoring strategies — is creating a “perfect storm” in IT.
What has changed is that every company, regardless of the industry in which it competes, is now a tech company.
Put another way, technology is a strategic imperative in all types of companies, from auto manufacturing to healthcare. Many believe that cars are just computers on wheels, and no physician or pharmacy can function without tablets and online insurance authorizations.
Technology can provide a competitive advantage and is orders of magnitude more crucial to operations than in the past. Combined with a huge increase in velocity— as end users demand the delivery of more features, faster— this paradigm means that IT must automate monitoring as much as possible and catch issues before they affect customers.
The reality is that the increase in velocity required by business cannot be accommodated without automating monitoring.
The significance and the solution
The job of monitoring is becoming harder than ever. However, no one is lining up to add significant head count to monitoring operations. The only smart thing for IT departments to do is to monitor more intelligently.
Technology (ironically) to the rescue! Just as technology raised the stakes for monitoring teams, now technology must solve the problem it helped create—by automating the process of isolating the signal from the noise that makes monitoring so challenging.
For example, when a new feature fails, thousands of alerts can be generated, and most relate to the same underlying issue that caused the failure. While manual triage worked in the past because the volume of features, failures and alerts was low, now machine automation must be used to eliminate noisy alerts so that humans can once again manage IT services.
The best solution is to “fingerprint” each alert uniquely with tags like “host,” “check” and “service” and then group similar ones together. With this approach, service disruptions that previously might generate 50,000 alerts can be reduced to a small number of actionable incidents—a scale that human beings can effectively manage. That is much more effective than adding thousands of NOC engineers to triage the ever-increasing volume of machine alerts.
It’s all about applying a business rationale to common issues
Again, this is significant because reliable technical performance, determined by the perception of quality and ease-of-use, provides a competitive advantage. And an organization’s very reputation is tied to the successful use of technology.
As dependence on technology increases, the monitoring of technology must evolve at least as fast as the technology itself is evolving —and that means applying next-generation systems management tools, including machine automation, to monitoring.
Monitoring thus becomes the difference between success and failure for many organizations. Without effective monitoring, companies risk permanently damaging both reputation and customer relationships.
The stakes couldn’t be higher. Corporate brand, profitability, customer loyalty and shareholder value are all directly related to a strategic commitment to monitoring.
(About the author: Dan Turchin is vice president of growth strategy at Alert Correlation Platform BigPanda)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access