7 tips for stress testing a disaster recovery plan
A disaster recovery plan is a bit like an insurance policy: we all agree we need it and we all hope we’ll never use it. And as with insurance, nobody wants to discover their DR plan doesn’t actually protect them when a disaster hits.
Similarly, nobody wants to find out that their DR plan is overdone – meaning they’ve been spending too much time, money and energy maintaining it. But if you don’t regularly stress test your DR plan, you could find yourself in one of these situations.
I’ve worked with a lot of businesses, and I’ve noticed that few conduct regular stress tests of their DR plans. That’s a problem: no disaster recovery plan is good enough to magically transform as a business changes – and realistically, no business remains static. At a previous firm, we tested quarterly and found changes and updates during every test!
So how can you verify that your DR plan fits your current needs? Follow these seven steps.
1. Plan for failure
Things will fail during stress testing. That’s the point of doing it.
Prepare for this reality. If you don’t, people may feel incentivized to make everything look good rather than making sure everything actually works. So be the champion of failure: when a DR plan doesn’t go as you hoped, remember that you’re actually a hero.
You found a mis-alignment between your needs and your capabilities. Now, armed with that knowledge, you are empowered to get them back into alignment.
2. Put someone in charge
I’m sure no one in your company would disagree that it’s important to stress test your DR plan. And I’m equally sure that no one will do it if you don’t explicitly assign it.
People are busy. And while DR is important, it’s rarely urgent – until it is.
So put someone in charge of leading DR stress tests. This person will have to…
- Keep track of changing business needs as they relate to DR: are the RTO and RPO still valid? Do you need to update those and retool? Did you acquire a new business that now has to be accounted for?
- Schedule regular tests.
- Oversee the tests.
- Update your DR solution as needed – maybe a key database will be missing. Maybe there will be stuff you can get rid of.
- Update your DR plan to accommodate the new solution.
That last item is crucial: the whole point of testing is to identify parts of the plan that no longer fit your business needs. Be sure your new Head of Testing understands this; it’s easy for a “failure” during a test to feel like a personal failure. In reality, though, testing failures are wins because they let you prevent real-world failures.
3. Look at your current DR plan
Before letting your eager new testing lead launch their first stress test, take some time to review the DR plan you have in place, especially if it’s more than a year old. If the plan doesn’t align with your current business needs (e.g., if the latest app you’ve rolled out isn’t even mentioned), make updates.
The goal is to create a DR plan that will meet the needs of the entire business as it exists today. You’ll want to collaborate with leaders outside IT to determine…
- Which apps and functionalities need to run.
- Which hardware must be online.
- What dependencies flow from the above.
Ideally, you’ll walk away from these conversations with RTOs and RPOs that, in the event of a disaster, accommodate everyone’s needs.
Remember: complexity leads to fragility. Aim for a plan that offers adequate coverage while staying as lean as possible.
4. Get the C-suite on board
Remember when I suggested collaborating with non-IT leaders? The best way to get them to take your request seriously is to get leadership on board. Without leadership’s buy-in, you’ll likely struggle to convince other company leaders to prioritize your important (but again, non-urgent) request.
Tip: instead of scare tactics, focus on the positives. Instead of saying, “We won’t be able to service orders,” try, “We’ll sign more customers because they’ll be confident we can support them.”
If your C-suite already understands the importance of having a disaster recovery plan, make sure they’re on board with the importance of testing. The best way to do that is to…
5. Stick a dollar sign on DR and stress testing
Everyone thinks poetry is “important.” But do people throw money at poetry projects? They do not. That’s because it’s very hard to show the financial imperative of maintaining a vibrant poetry community. Don’t let your DR plan suffer the same fate.
When you talk to the C-suite about DR stress testing, frame it in terms of dollars and cents: how long can your business afford to be down?
Guide your leadership here, helping them calculate potential revenue losses from each minute of downtime for various business functionalities. And don’t forget to mention the potential reputational hit that even relatively minor downtime can cause if handled wrong.
You’ll also need to provide insight about the many types of DR solutions and the various costs of each. Finding the right fit based on your financial analysis will help ensure that you craft a recovery plan that fits just right.
6. Establish guidelines for real-life DR
Any strong DR plan must include conditions that trigger it.
Common triggers include…
- Time: E.g., you’ll give your team five hours to attempt to fix a problem before you fail over.
- Functionality: Can certain parts of an app go down without triggering your DR plan? In other words, are there non-essential functions that your business can temporarily live without?
- Seasonality: For both of the above, are there times of the day, week, month, or year when the trigger changes? Do you have busy seasons or busy seasons for certain functionalities?
There’s no need to reinvent the wheel. These guidelines should be based on the larger business needs you’ve been considering all along. That means you’ll want to verify your triggers with other stakeholders in the company.
You should also enlist non-IT employees to help out during stress tests. Someone from each department should be in charge of testing the functionalities they depend on – and someone should be in charge of testing from a user’s perspective. Let people know what you’ll need from them.
If you want to really turbo-charge your testing, do it without one or two key people (selected at random). That replicates real-world conditions, when someone is always unavailable.
7. Plan your stress tests strategically
Scheduled downtime is a normal part of doing business. It’s also an ideal time to conduct DR stress tests; when something goes wrong, you can easily roll back to production and reevaluate, recalibrate and update your plan.
Stress Test for the Whole Business
This is worth restating: disaster recovery is a whole-business effort. That means your DR plan must work for the whole business, the whole business must participate in stress testing, and the plan and testing schedule you develop should depend on the needs, budget, and risk tolerance of your business as a whole.
Because of that, there’s no single way to do it. The key is just to do it and keep doing it so that you’re always ready for something to go wrong.