Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter . Disaster Recovery

What Happens When the Network Stops Working?

Disaster recovery (DR) is the planning process for how you restore network and computer services or continue operating in the event of a natural or human disaster. The process includes identifying the hardware and software required to run business-critical applications and the associated processes that provide a smooth transition from the event.

A DR plan assesses the loss of time and loss of data that are acceptable to the business. Within those limits, DR moves processing to an alternate location after a catastrophic event.

DR establishes an alternate processing location. The alternate location must have all the components of the production site already in place before the disaster. The move to an alternate location requires an understanding of five key components, as described in the following table.

Component	Description
Equipment	What equipment is affected? Which servers, disk, and networks?
Data	Which databases and data are affected? (Data includes application code.)
People	Who is responsible for recovery?
Location	Where does the recovery take place?
Network	How do we switch the network to the recovery location?

Statistically, fire is the leading cause of disaster. Examples of other possible disasters include storms, floods, earthquakes, chemical accidents, nuclear accidents, wars, terrorist attacks, cold winter weather, extreme heat, airplane crashes (loss of key staff), and avalanches. The planning process includes all the different locales and determines the political stability of the critical business locations.

For each possible disaster that could affect a site, the disaster team assesses the impact to the business in advance. The team addresses the following questions:

How much of the organization’s resources (including data, equipment, and staff) could be lost? What are the replacement costs?
What efforts are required to rebuild?
How long does it take to recover?
What is the impact on the overall organization?
What customers are affected? What is the impact on them?
How much does it affect the share price and market confidence?

DR Planning

After outlining possible threats, the DR team ranks the services and systems according to three categories: mission critical, important, and not so important. The ranking determines the depth of planning, funding, and resiliency. A DR team takes the following steps:

Form a planning group.
Perform risk assessments and audits.
Establish priorities for the network and applications.
Develop recovery strategies.
Prepare an up-to-date inventory and documentation of the plan.
Develop verification criteria and procedures.
Implement the plan.

The team identifies recovery action for the applications staff, system administrators, database administrators, and network staff.

Resiliency and Backup Services

Business resiliency is the ability to recover from any network failure or issue, whether it is related to disaster, links, hardware, design, or network services. A highly available network (built for resiliency) is the bedrock of effective and timely disaster recovery.

Consider the following areas of the network for resiliency:

Network links
- Carrier diversity
- Local loop diversity
- Facilities resiliency
- Building wiring resiliency
Hardware resiliency
- Power, security, and disaster
- Redundant hardware and onsite spare equipment
- Mean time to repair (MTTR)
- Network path availability
Network design
- Layer 2 WAN designs
- Layer 2 LAN design
- Layer 3 IP design
Network services resiliency
- Domain Name System (DNS) resiliency
- Dynamic Host Configuration Protocol (DHCP) resiliency
- Other services resiliency

Preparedness Testing

The final step to any DR is testing processes and systems. Just as firefighters practice fighting different types of fires to hone their skills and reactions, the DR team should plan mock disasters to ensure that systems, network services, and data all transition as expected and that all the people involved understand their parts in the transition.

At-A-Glance—Disaster Recovery

<division> <title>Why Should I Care About Disaster Recovery?</title>

System outages can be devastating to a business. Regardless of the cause, any outage can cost a company hundreds of thousands or even millions of dollars per hour of system downtime.

Disaster recovery is the planning and implementation of systems and practices to ensure that when disasters do occur, the core business functions continue to operate.

Many people prefer to use the term “business continuance” rather than disaster recovery, because the former term implies that you can actually avoid disaster (business stoppage) with the proper planning and implementation.

</division><division> <title>What Are Typical Causes of Disasters?</title>

Disaster come in all shapes and sizes. For simplicity, we organized “typical” causes of business disruptions into a few categories:

Natural disasters
- Earthquakes
- Flood
- Hurricane or typhoon
- Blizzard
Unintentional man-made disasters
- Backhoes
- Fire
- Illness (loss of staff)
- Power outages
Intentional man-made disasters
- Acts of war
- Hacking
- Work stoppages

</division><division> <title>What Are the Problems to Solve?</title>

A disaster-recovery plan has four phases: assessment, planning, testing, and implementation/recovery. You must put a plan in place for each risk assessed. Although disruptions can come in many forms, we concentrate on network services and critical applications and data.

Disruption	Solution
Phone service is interrupted.	Multichannel communications strategy.
Network service is disrupted.	Distributed, redundant network design.
Mission-critical application is down.	Business-continuity plan (standby data center, backup).
You can’t commute to the office.	Secure remote access and flexible communications (mobility, telecommuting).
Productivity is constrained.	Innovative Internet Protocol (IP) applications.

</division><division> <title>Before Disaster Strikes</title>

The first step in a business-continuance plan is to assess the business criticality and downtime impact of each business application. The risk assessment should consider how a temporary or extended loss of each application and function impacts the business, regarding the following:

Financial losses (lost revenue)
Operation disruption
Customer satisfaction and retention
Lost productivity
Brand dilution
Legal liability
Stock price
Credit rating

For each critical system, application, or function, you must implement a backup and recovery plan.

</division><division> <title>Planning for Disasters</title>

After you identify and assess the critical systems, data, and applications, you must develop a plan. A business-continuance plan has two primary components: designing the network for high availability and backing up critical systems in geographically diverse buildings.

Networks designed for high availability are resilient to disruptions such as faulty hardware, disconnected or broken cables (“backhoe failures”), and power outages.

More severe disasters (such as a building fire or earthquake), however, can wipe out entire data centers and application-server farms. The only way to recover gracefully from such an event is to have a completely backed-up secondary data center, as shown in this figure.

</division><division> <title>Backing Up Systems</title>

You can back up data centers and application farms in many ways. Some companies back up systems each night after the close of business hours. When they do, the worst-case data loss is a single day. Another backup scheme is called synchronous data mirroring. Synchronous data mirroring allows companies to perform real-time backups with no lag, ensuring that they lose virtually no data in the event of a disaster. An added benefit of synchronous data mirroring is that both systems can be online at the same time, providing load and application sharing, which can increase overall productivity. The main challenge with synchronous data mirroring is that the potential decrease in application performance is significant. To achieve synchronous remote mirroring without affecting application performance, you need a high-speed, low-latency connection, such as dense wavelength division multiplexing (DWDM) over optical fiber.

</division><division> <title>Practicing for Disaster</title>

One of the best ways to ensure a smooth recovery from a disaster is to provide staff with real-world training simulations. Allowing IT staff to practice different disaster scenarios greatly improves their ability to cope with actual disasters.

</division><division> <title>After a Disaster Occurs</title>

Practice and planning are put to the test if a disaster strikes. To avoid confusion or worse (such as causing more damage), develop a checklist as part of the planning effort and follow it when the time comes. The checklist varies from business to business and situation to situation, but most should closely resemble this example:

Make sure that your people are safe. Are all personnel accounted for? Consider sending noncritical personnel home to avoid confusion.
Make sure the backup systems are online.
Assess the likelihood of additional or secondary disasters. An earthquake, for example, could spark fires or burst gas or water mains.
Monitor the network to ensure business continuation.

</division><division> <title>Restoring Primary Systems</title>

Depending on the severity of damage and the duration of downtime for primary systems, restoring these systems might also disrupt business.

Consideration for backing up data stored on the backup systems, and the restoration of the primary systems, must be taken into account.

</division>

Figure . Disaster Recovery: Catastrophic Fail-Over

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Disaster Recovery

Create new playlist

Sign In

Sign Up

Chapter . Disaster Recovery

What Happens When the Network Stops Working?

DR Planning

Resiliency and Backup Services

Preparedness Testing

Table of Contents for
Disaster Recovery