Disaster recovery (DR) is the planning process for how you restore network and computer services or continue operating in the event of a natural or human disaster. The process includes identifying the hardware and software required to run business-critical applications and the associated processes that provide a smooth transition from the event.
A DR plan assesses the loss of time and loss of data that are acceptable to the business. Within those limits, DR moves processing to an alternate location after a catastrophic event.
DR establishes an alternate processing location. The alternate location must have all the components of the production site already in place before the disaster. The move to an alternate location requires an understanding of five key components, as described in the following table.
Component | Description |
---|---|
Equipment | What equipment is affected? Which servers, disk, and networks? |
Data | Which databases and data are affected? (Data includes application code.) |
People | Who is responsible for recovery? |
Location | Where does the recovery take place? |
Network | How do we switch the network to the recovery location? |
Statistically, fire is the leading cause of disaster. Examples of other possible disasters include storms, floods, earthquakes, chemical accidents, nuclear accidents, wars, terrorist attacks, cold winter weather, extreme heat, airplane crashes (loss of key staff), and avalanches. The planning process includes all the different locales and determines the political stability of the critical business locations.
For each possible disaster that could affect a site, the disaster team assesses the impact to the business in advance. The team addresses the following questions:
How much of the organization’s resources (including data, equipment, and staff) could be lost? What are the replacement costs?
What efforts are required to rebuild?
How long does it take to recover?
What is the impact on the overall organization?
What customers are affected? What is the impact on them?
How much does it affect the share price and market confidence?
After outlining possible threats, the DR team ranks the services and systems according to three categories: mission critical, important, and not so important. The ranking determines the depth of planning, funding, and resiliency. A DR team takes the following steps:
Form a planning group.
Perform risk assessments and audits.
Establish priorities for the network and applications.
Develop recovery strategies.
Prepare an up-to-date inventory and documentation of the plan.
Develop verification criteria and procedures.
Implement the plan.
The team identifies recovery action for the applications staff, system administrators, database administrators, and network staff.
Business resiliency is the ability to recover from any network failure or issue, whether it is related to disaster, links, hardware, design, or network services. A highly available network (built for resiliency) is the bedrock of effective and timely disaster recovery.
Consider the following areas of the network for resiliency:
Network links
Carrier diversity
Local loop diversity
Facilities resiliency
Building wiring resiliency
Hardware resiliency
Power, security, and disaster
Redundant hardware and onsite spare equipment
Mean time to repair (MTTR)
Network path availability
Network design
Layer 2 WAN designs
Layer 2 LAN design
Layer 3 IP design
Network services resiliency
Domain Name System (DNS) resiliency
Dynamic Host Configuration Protocol (DHCP) resiliency
Other services resiliency
The final step to any DR is testing processes and systems. Just as firefighters practice fighting different types of fires to hone their skills and reactions, the DR team should plan mock disasters to ensure that systems, network services, and data all transition as expected and that all the people involved understand their parts in the transition.