Continue in 2 seconds

Avoiding the Butterfly Effect in Your Data Center

Published
  • November 01 2003, 1:00am EST

In 1963, meteorologist Edward Lorenz created a famous concept now known as the Butterfly Effect when he asked the question: Does the flap of a butterfly's wings in Brazil set off a tornado in Texas? Just as Lorenz's Brazilian butterfly has the potential to set off a tornado on the other side of the world, any change to IT environments can have severe unintended consequences thanks to today's interconnected and complex webs of systems, networks, applications and databases.

Data center managers are trying to meet the conflicting challenges of providing continuous availability of critical applications while reducing risks and costs. However, achieving continuous availability in complex environments is a difficult challenge. AFCOM's Data Center Institute analysis identified three main keys to cost-effective IT infrastructure management. They are: achieving service excellence and continuous availability; reducing the risk of change; and using IT infrastructure management to achieve objectives.

In the last 10 years, data center managers have had to cope with vast changes in the market forces driving technology. The rapid implementation of Web-based applications and the rise of mobile professionals have made applications available from any Web-enabled computer. An organization's IT infrastructure is now on display for employees, customers, partners –­ and competitors.

With this shift, user and customer expectations have risen dramatically. The direct outgrowth of this new business environment is the increase of formal service level agreements (SLAs) between service providers and customers. SLAs are increasingly used to quantify the loss of business directly attributable to contracted services that are unavailable and to help ensure that end users are being treated with the level of care their business deserves. SLAs are important tools because the cost of downtime to organizations can be immense.

META Group estimates the cost of downtime may range from a "low" of $340,000 of lost revenue per hour for the media industry to a staggering $2.8 million of lost revenue per hour for the energy industry. Downtime is a significant cost and must be avoided.

The data centers of organizations that have not yet implemented SLAs or are in the beginning stages of doing so are challenged with meeting undefined user/customer expectations. As enterprises attempt to provide higher levels of availability, costs can increase dramatically. According to Gartner, Inc., moving from 99 percent availability to 99.9 percent availability can result in an exponential increase in cost for a relatively small increase in availability.

The push for continuous availability is also putting extreme pressure on data center crews to minimize maintenance windows, despite an ongoing stream of software updates and patches, database maintenance, hardware configuration changes and network changes. In an effort to reduce the normal maintenance windows, many vendors now offer online disk backup as an integral part of their file systems. Online database and storage maintenance products are also available. In addition, more and more manufacturers are making available hot-swappable components that can be removed and added to a system while it remains fully powered and operational.

Despite these innovative new approaches to minimizing maintenance windows, data center managers are finding that when systems are down for maintenance, every minute must count. To address this, organizations have put a significant new emphasis on implementing well-defined and planned change management policies and procedures. Properly implemented change management minimizes errors and ensures that scheduled work is completed within the maintenance window to avoid SLA penalties and damage to the end-user experience.

Striving for high availability has caused many organizations to put rigorous change management policies into place. However, even a well-planned and disciplined change management process can't help you minimize the risks that result from the Butterfly Effect. Change management processes assume that you know exactly where devices are located, exactly what software operates those devices and exactly which applications, networks, databases and organizations are affected by the change. When the butterfly flaps its wings in your IT organization, do you know exactly what the effect will be?

The ability to accurately answer this question can mean the difference in meeting the formal terms of an SLA, managing customer perceptions of service and, ultimately, driving down risk of change.

Organizations can mitigate these effects by automating their IT infrastructure management (ITIM) ­ a systematic process enabling organizations to identify, synthesize, manipulate and manage information about physical and logical devices, network infrastructure, software and all the supporting electrical, cooling and other infrastructure systems. Through ITIM, everyone involved with the IT infrastructure can quickly and easily understand the impact of IT changes on the business ­– as well as the impact of business changes on the IT infrastructure.

There are three key requirements to effective ITIM. The first is to aggregate and integrate highly decentralized data, both physical and logical, about your IT infrastructure. The second is to automate and digitize standardized and repeatable processes and methodologies that suit the organization and its culture. Third is to deliver this information to customers in a personalized and actionable format.

Aggregating and integrating data about the physical and logical infrastructure from multiple sources is a crucial first step. Ideally, this repository should store information in a visual format, containing both scaled drawings of the environment as well as detailed information on every device including power and network requirements, maintenance contracts, repair history and a full accounting of its change history. Unfortunately, for many data centers, this knowledge is not institutionalized and typically leaves the building every night when key employees go home. Without a common repository, the likelihood of a small change causing the Butterfly Effect greatly increases.

The second requirement is the automation and digitization of standardized ITIM processes. Within IT organizations, multiple groups –­ facilities, network services, platform engineering, configuration services and change management –­ need to effectively communicate and share information to deliver the required service levels and availability. Automated process management enables these groups to streamline and better manage the delivery of services.

Finally, effective ITIM means that detailed information can be shared across the enterprise in an easy-to-use, intuitive visual manner. In other words, it needs to be "actionable information" ­– information that is in context and represented in a format fit for immediate use.

Organizations worldwide are now embracing ITIM as a means of achieving large increases in availability and reliability for relatively modest investments in services and tools. Organizations implementing ITIM have achieved these goals by reducing the time required to resolve problems, improving capacity planning, streamlining change management, increasing business resiliency and better managing their data center assets. Enterprises that have made the leap to effective ITIM no longer fear the winds of change –­ or Lorenz's butterfly.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access