Chapter . Disaster Recovery

What Happens When the Network Stops Working?

Disaster recovery (DR) is the planning process for how you restore network and computer services or continue operating in the event of a natural or human disaster. The process includes identifying the hardware and software required to run business-critical applications and the associated processes that provide a smooth transition from the event.

A DR plan assesses the loss of time and loss of data that are acceptable to the business. Within those limits, DR moves processing to an alternate location after a catastrophic event.

DR establishes an alternate processing location. The alternate location must have all the components of the production site already in place before the disaster. The move to an alternate location requires an understanding of five key components, as described in the following table.

Component

Description

Equipment

What equipment is affected?

Which servers, disk, and networks?

Data

Which databases and data are affected?

(Data includes application code.)

People

Who is responsible for recovery?

Location

Where does the recovery take place?

Network

How do we switch the network to the recovery location?

Statistically, fire is the leading cause of disaster. Examples of other possible disasters include storms, floods, earthquakes, chemical accidents, nuclear accidents, wars, terrorist attacks, cold winter weather, extreme heat, airplane crashes (loss of key staff), and avalanches. The planning process includes all the different locales and determines the political stability of the critical business locations.

For each possible disaster that could affect a site, the disaster team assesses the impact to the business in advance. The team addresses the following questions:

  • How much of the organization’s resources (including data, equipment, and staff) could be lost? What are the replacement costs?

  • What efforts are required to rebuild?

  • How long does it take to recover?

  • What is the impact on the overall organization?

  • What customers are affected? What is the impact on them?

  • How much does it affect the share price and market confidence?

DR Planning

After outlining possible threats, the DR team ranks the services and systems according to three categories: mission critical, important, and not so important. The ranking determines the depth of planning, funding, and resiliency. A DR team takes the following steps:

  1. Form a planning group.

  2. Perform risk assessments and audits.

  3. Establish priorities for the network and applications.

  4. Develop recovery strategies.

  5. Prepare an up-to-date inventory and documentation of the plan.

  6. Develop verification criteria and procedures.

  7. Implement the plan.

The team identifies recovery action for the applications staff, system administrators, database administrators, and network staff.

Resiliency and Backup Services

Business resiliency is the ability to recover from any network failure or issue, whether it is related to disaster, links, hardware, design, or network services. A highly available network (built for resiliency) is the bedrock of effective and timely disaster recovery.

Consider the following areas of the network for resiliency:

  • Network links

    • Carrier diversity

    • Local loop diversity

    • Facilities resiliency

    • Building wiring resiliency

  • Hardware resiliency

    • Power, security, and disaster

    • Redundant hardware and onsite spare equipment

    • Mean time to repair (MTTR)

    • Network path availability

  • Network design

    • Layer 2 WAN designs

    • Layer 2 LAN design

    • Layer 3 IP design

  • Network services resiliency

    • Domain Name System (DNS) resiliency

    • Dynamic Host Configuration Protocol (DHCP) resiliency

    • Other services resiliency

Preparedness Testing

The final step to any DR is testing processes and systems. Just as firefighters practice fighting different types of fires to hone their skills and reactions, the DR team should plan mock disasters to ensure that systems, network services, and data all transition as expected and that all the people involved understand their parts in the transition.

Disaster Recovery: Catastrophic Fail-Over

Figure . Disaster Recovery: Catastrophic Fail-Over

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset