The primary focus of Disaster Recovery Plans (DRP) and efforts is to minimize revenue loss and the impact on business operations and customers. This is achieved by recovering disrupted IT systems and services at the recovery site as quickly as possible. However, when building the DRP, planners may put the bottom line at risk by overlooking the transition back to the primary site.
Also Check Out:
Without advance planning for return to the primary site, costs related to maintaining operations at the recovery site will continue to accrue, driving up the total cost of the disruption. Making primary site restoration part of the DRP ensures that the rebuilding and repatriating activities are initiated without undue delay, and the cost associated with operating under emergency conditions will be minimized.
The cost of continuing operations at the disaster recovery (DR) site is a factor of:
- Employing inefficient manual or non-standard interim processes at the recovery site.
- Replacing critical staff lost due to the disaster.
- Cost of the availability and performance of service levels.
- Cost of housing employees in DR site locations.
- Loss of customers or market share due to inability to fully service.
- Additional vendor licensing and contract costs.
- Inability to deliver on strategic initiatives resulting in lost business opportunities.
- Loss of reputation due to inability to fulfill contracts.
Running operations in emergency mode is fraught with distractions, issues, and practices that may not measure up to the organizations normal IT best practices. The result can be added complexity when planning the repatriation. DR planners can avoid major pitfalls by following these guidelines when planning the transition to restore normal IT operations at the primary site.
- Continue to do all system and data backups at the recovery site. Loss of data will make it much more difficult to re-establish systems and re-launch operations at the primary site.
- Ensure that all data and access controls are secure. The transition period for the move provides opportunities for unauthorized access to either the data or premises, which can be avoided by continuing to enforce IT and physical security policies.
- Keep documentation current and available. Ensure that all documentation is on hand or can be reproduced quickly and replace vendor documentation where required. Operations are easier to resume when complete documentation is available.
- Test and certify the primary site. Build a solid test plan and conduct thorough user acceptance testing to certify that the primary site is ready to go live after transitioning data and services.
- Validate the status of contracts. Ensure that all contracts and licenses are current at the primary site. Seat and site licenses may need to be updated depending on the new configuration of the primary site. Renegotiate rates and terms where possible or necessary.
- Learn from the event. Conduct a post mortem review of the entire disaster event from detection to repatriation and document what worked, what did not work and lessons learned. Use these to update the DRP processes, including the repatriation plan. Test the updated plans as soon as it is feasible.
Improvement & Optimization
The Restoration Team is responsible for the rebuilding and repatriation activities. The team should operate from a command center accessible to the primary site in order to manage the rebuilding tasks and oversee the progress of the plan. There are three phases associated with transitioning operations back to the primary site:
Phase 1: Rebuild
Depending on the severity of the damage, repairing the original site can vary from replacing a few servers to rebuilding or even relocating the site or facility. The rebuilding activities could last for days, weeks, or months until the site or system is fit for restoring normal operations.
Goal: To restore the primary site facilities and infrastructure to full operational capability.
Conduct a primary site assessment
The primary site assessment determines the extent of repairs required to get back to full operations. Take an inventory of the damage to pre-existing facilities and infrastructure to determine the rebuild or replacement requirements.
Consider the opportunity to upgrade
Identify improvements, risks or other issues that were previously unaddressed during prior operations. For example, relocating the primary site farther from the recovery site, implementing server virtualization, improving fire detection capability, or strengthening access controls.
Update contracts and licenses
Renegotiate vendor contracts and licenses and plan upgrades to more current systems or hardware. Incorporate these plans in the repatriation plan.
Create a repatriation plan
Include an inventory of required IT assets, services, documentation, fixtures, furniture, and office supplies for all rebuilt, upgraded or new components and facilities. Also, include costs, timelines, resource allocations, and testing requirements. The Repatriation Plan should include a full user acceptance test plan to certify that the primary site is ready.
Outcome: The primary site has been rebuilt including infrastructure upgrades and modifications to allow for the resumption of normal IT operations.
Phase 2: Repatriate
In the Repatriate Phase, the repatriation plan is executed and all files, databases, applications, and services are transitioned from the recovery site back to the primary site. The plan implementation team and recovery site operations team remains intact and engaged until the primary site has been fully restored and transition is complete.
Goal: Transition all IT operations and services back to the primary site.
Suspend business activity
Provide appropriate notifications to users and customers that business activity will be suspended for the duration of the transition.
Perform a current backup from the backup site and deliver to the new primary site. This will be the starting point for resumption of business at the primary site.
Restore data and systems to the new infrastructure at the primary site, and account for modifications in the new site.
Test and certify
When systems have been restored, test and certify as documented in the Repatriation Plan. Certification validates that all the requirements of the primary site restore have been met and all systems and functions are fully operational.
Take more backups
After certification, take a backup at the primary site. This backup will include fixes or modifications made to resolve issues found in testing and certification. The backup will also reflect the current configuration of the new primary site build.
Resume normal operations
Cutover and resume normal operations and services at the primary site. Provide notification to users and customers that normal operations are restored.
Decommission the recovery data center disposing of all temporary hardware employing secure procedures. Release the plan implementation and recovery site operations teams.
Outcome: Systems and services are restored to the fully operational primary site and normal operations are resumed.
Phase 3: Rewrite
Activities in the Rewrite Phase focus on documentation. The primary site has been rebuilt and operations at the site have resumed. Now the DR Committee conducts a post mortem review of the event, the recovery, and the primary site restoration.
Goal: To review the procedures, the event, and subsequent actions executed to recover and restore normal operations.
Conduct a post mortem
Working with all involved teams and stakeholders conduct a post mortem review of the disaster event, the DR plan/implementation and the Repatriation Plan/implementation to determine:
Develop recommendations from the post mortem consisting of:
Update the disaster recovery plan
Update documentation including:
Test the updated DRP
Test the updated DRP and documentation within 60 days to be prepared in the event of another disaster.
Outcome: Updated and improved DRP and implementation procedures.
Eventually, all disasters end and operations and services must be resumed at the primary site. How this is planned and executed will have a direct effect on the total cost of the disruption:
- Follow best practices. Ensure to comply with the organization’s IT best practices and policies while operating at the DR site. This may be difficult but will ensure that data and facilities remain secure.
- Consider upgrades when rebuilding. When planning to rebuild the primary site, consider taking the opportunity to implement upgrades to the facilities, infrastructure, systems, contracts, and licenses.
- Certify the primary site. Prior to cutting over production services from the DR site, conduct user acceptance testing and certify that the primary site is ready to go live.
- Review and update. Update the DR plans based on lessons learned from the post mortem review of the disaster event, the recovery and restoration procedures, and actions and modifications to the new primary site.
Establishing operations at the recovery site is just the first half of the work involved in disaster recovery. At some point, services and systems must be returned to primary site facilities to resume normal operations. Repatriation is more than disaster recovery in reverse, and care must be taken to ensure the enterprise learns and grows from the event.
© 1998-2010 Info-Tech Research Group. All rights reserved. Reprinted by permission
Info-Tech's products and services combine actionable insight and relevant advice with ready-to-use tools and templates that cover the full spectrum of IT concerns. For more information, go to www.infotech.com.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access