In the minds of many, a disaster means a hurricane, earthquake, flood, fire or other natural calamity or, possibly, a terrorist attack. These types of disasters are uncommon, but they do happen. However, for the purposes of this article, “disaster” has a broader meaning and thus is much more common. In this context, a disaster is any event that causes either of the following:
- The destruction of all online operational copies of an organization’s data and/or applications. “Online operational copies” include both the production copies and any ready-to-run backup copies that can be placed in the production role immediately and, preferably, seamlessly.
- The loss of access to all online operational copies of the organization’s data and/or applications for a sufficiently long period such that a recovery operation will be faster and more cost-effective than waiting for the online operational copies to come back online.
In the event of a natural disaster or terrorist attack, the organization’s first objective should be, clearly, to protect and maintain the safety and security of its employees and other people on its premises. Once this objective has been achieved, or if people have not been placed at risk by the situation, the highest-priority task of the IT department is to get the business-critical systems running again as quickly as possible.
Failure to resume operations swiftly can compound the effects of the disaster and threaten the survival of the organization. According to one often-cited statistic from the U.S. Bureau of Labor Statistics, 40 percent of all companies that experience a disaster never reopen, and more than 25 percent of the companies that are able to reopen close within two years. Thus, disaster recovery is especially vital.
Avoiding Murphy’s Law
In the midst of the excessive stress that is inevitable in any disaster recovery process, if something can go wrong, it most likely will. Particularly in any complex IT environment, many unimaginable things can go wrong. Fortunately, there are a number of ways to lessen the effects of Murphy’s law and to reduce the effects of a disaster. Here are 10 tips that can help minimize a disaster’s impact on IT assets.
- Inventory all IT assets. The first prerequisite to disaster recovery is to know what needs to be recovered. If no detailed inventory of IT assets – both tangible and intangible – is available, make one now. What hardware, software and data will have to be recovered? Which skills will be required to perform the recovery operations and then run the systems at a backup location if necessary? The IT asset inventory list should be included in the company’s disaster recovery plan.
- Maintain offsite data backups. A comprehensive tape archive strategy is crucial. To minimize recovery times in situations where the physical assets of the primary data center are still operational, backup data has to be available on locally stored tapes.
- Prioritize the data and applications and assess their varying criticality. All are not created equal. Some will be indispensable in reestablishing the business and need to be restored first. Recovery of secondary applications and data can be deferred until the critical applications and data are restored. The data recovery plan should explicitly state the recovery order of data and applications to reflect these priorities.
- Don’t omit standalone data from the recovery plan. Increasingly, business-critical data and documents are stored on laptop and desktop computer disk drives. The data recovery plan should include details on how this data will be backed up and recovered if lost.
- Formally document the plan. A disaster recovery plan that exists only in someone’s head is no plan at all. While we’d rather not consider the prospect of serious injury or death, it’s possible that some key employees will not be available after the disaster. They may be on vacation and otherwise unreachable during a recovery operation. If the recovery plan exists only in those people’s heads, the remaining staff won’t be able to execute it. Although it may be possible to automate the initiation of some recovery processes and use the system to enforce the completion of checklists, it’s important to keep hard-copy printed copies of the recovery plan in multiple secure locations, including at the recovery site. A plan for restarting the organization’s systems that is locked inside an application that is unavailable will be useless when the time comes to initiate the recovery operations.
- Test the solution. In any complex system or process, what works in theory often fails in practice. Regular testing not only ensures that the recovery plan is viable, but also acts as a training tool. People who have already performed the recovery procedures a number of times during regular testing will be familiar with the plan and confident in their abilities to perform the required actions.
- Maintain multiple communication channels. When staff has to be notified of a DR event, normal communication channels, such as email and phone, may be disrupted. Consider text messaging, personal email addresses and alternate phone numbers. as alternative communication vehicles. In addition, there are third-party companies that can handle disaster communications.
- Automate as much as possible. Human error is possible under any circumstances, but during particularly stressful situations, it is almost inevitable. The more automated the recovery process, the better – thereby removing the human element. However, keep in mind that the systems responsible for automating the recovery operations may be unavailable after a disaster. Thus, just as business applications and data need backups, manual backups for all of the automated recovery processes are crucial.
- Don’t neglect security. When recovering from a disaster, it can be tempting to bypass normal security protocols and policies in order to simplify and speed the recovery. In general, this is a bad idea. Security policies were established for a reason, and bypassing them may create risks that are as disruptive as, or more disruptive than, the disaster itself. Also, remember to store passwords in multiple locations. They will be useless if they are available only at a site that is inaccessible.
- Ask for help. Creating an effective DR plan can be challenging. DR experts and consultants with extensive knowledge and experience in the field can help leverage the best practices of many companies. They can more effectively craft a plan that meets all business requirements at a cost that fits the budget and is justified by the benefits.
In addition, it’s critical to protect business operations from the risk of the destruction of the data center. That means backup tapes have to be available at a secondary location. Maintaining an up-to-date copy of backup data at an offsite location is worth almost any price. A local fireproof vault is not an adequate alternative because, depending on the circumstances, the vault may not offer sufficient protection or may not be accessible quickly after a disaster.
And remember, a laptop or desktop computer may be destroyed in the same disaster that strikes a data center. Therefore, it is not enough to back up PC-based data onto a network drive in the primary data center. Critical PC-based data must also be included in the offsite backup data sets.
Test the recovery processes at least three or four times per year. Tests will often reveal flaws in the plan. When this happens, be sure to update the plan to fix the flaws. Avoid using an off-the-cuff approach to DR testing. Maintain a test script that follows the DR recovery plan as closely as possible and tests as much of it as possible. For operational reasons, it may not be possible to test all aspects of a recovery operation during every test, but every effort should be made to leave as little as possible out of the DR tests.
A Fresh Set of Eyes
It is human nature to often not see consciously what should be the most obvious. Even a comprehensive, well-tested plan may omit a data store, application, process or piece of hardware because its use has become second nature to employees. Thus it may be a worthwhile investment to bring in an objective DR consultant who can spot such oversights. Remember the old adage, it’s better to be safe than sorry.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access