As cloud operations have become the norm, it’s easy to forget that today’s digital world is based on physical infrastructure. Some businesses assume cloud operations cannot fail and thus find themselves more vulnerable to disruption caused by hardware failures. This summer’s massive system outages at Delta and Southwest are just two examples.
We work in the cloud, but in reality we need to be more grounded than ever. The cloud doesn’t let us off the hook for data protection or make us immune to operational failure.
While we all know the importance of backup, anyone who has experienced unexpected downtime knows that backup alone isn’t enough to keep business operations in flight during an outage. But you don’t have to suffer a crash landing, as a well-thought out continuity strategy can keep your business up and running.
State of Emergency
Delta’s outage began with an overnight failure in a switchgear that transmitted electricity to their Atlanta, GA. headquarters. The result? Massive flight delays on a busy Monday, which caused more than 650 canceled flights and a loss of tens of millions in profit.
Southwest’s outage in July, which happened to be their second system failure in less than a year, had a similarly huge impact on flights and brought down the company’s web site, resulting in millions of lost ticket sales. Their cause of downtime was identified as a simple router breakdown. Worse, both airlines are still dealing with the fallout months later.
In fact, Southwest’s CEO recently said, "We have significant redundancies built into our mission-critical systems, and those redundancies did not work. We need to understand why, and make sure that that doesn't happen again."
As a CEO myself, I never want to be caught unprepared like this, and no IT admin wants to explain to executives how something small like a router failure immobilized business operations. Let’s consider how disaster recovery (DR) planning, old-fashioned teamwork, and continuity in the cloud could have mitigated these crises.
Disaster Recovery (DR) Planning
When you’re a passenger awaiting takeoff, it’s easy to ignore the flight attendant’s instructions for emergency procedures. But when you need an oxygen mask, you really need it, and the same goes for a disaster recovery plan for your business.
Like a flight map, an effective DR plan means knowing where you want to end up and directions for how to get there. In technology terms this means understanding your Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs). This quantifies how much data you can afford to lose (measured by a point in time) and how quickly you need to recover (measured by downtime).
If you don’t yet have a plan documented and your budget is limited, a free business continuity and disaster recovery (BC/DR) tool can offer a great starting point. Thinking through the scenarios that could affect your business ahead of time will reduce the impact of turbulence and result in a faster recovery.
Just like flying a plane takes more than just a pilot, as CEO I know that my entire team is key to our success. When a disaster or outage strikes the business, the plan is only as strong as the team running it.
Your continuity strategy should provide your crew with alternate methodologies if a crucial system (like e-ticketing) goes down, and help your service representatives communicate with customers. On the IT side, be pragmatic about the tools you choose to invest in.
For example, do your IT admins have confidence running the backup system, and can they access the interface remotely? Have recovery processes been thoroughly tested so the team knows they can restore data and operations? To make recovery smoother, look for a backup solution with a super intuitive user interface as well as automated testing. A DR process that prioritizes the needs of your team will speed up recovery and minimize losses to both the business and the brand.
Continuity in the Cloud
When there’s a mechanical failure, the best news a passenger can receive is that another plane is available. The same is true when your site goes down. Often the best disaster recovery is failover.
In the past, this strategy was only available to businesses that could afford to build and maintain a secondary site, but now cloud-based DR services are available in a range of price points. Disaster Recovery as a Service (DRaaS) offers the distinct advantage of redundancy across multiple data centers.
But as Southwest learned, redundancy isn’t enough—recovery operations must also be tested. When shopping for DR services, look for a service provider that provides spin-up of your critical virtual machines as well as automated testing and recovery assurance in the cloud. Before committing, make sure that the provider’s service level agreement (SLA) can meet your RPO/RTOs.
How to Stay in Flight
Every pilot needs to know how to fly during a storm. The same is true for business leaders. When it comes to the question of downtime, the answer is not “if” but “when.” Strategic businesses will continue to take advantage of the cloud, but also need to recognize vulnerability and prepare accordingly.
Planning, teamwork, and cloud continuity can help you avoid a crash landing, prevent undue turbulence with employees and customers, and keep your business in flight no matter the weather.
(About the author: Paul Brady is chief executive officer at Unitrends, a leader in enterprise-level cloud-empowered business continuity solutions. For more information, contact the author at firstname.lastname@example.org or visit www.unitrends.com.)