Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Raymond Pompon, IT Security Risk Control Management, 10.1007/978-1-4842-2140-2_4

4. Risk Analysis: Natural Threats

Raymond Pompon¹

(1)Seattle, Washington, USA

Even with all our technology and the inventions that make modern life so much easier than it once was, it takes just one big natural disaster to wipe all that away and remind us that, here on Earth, we’re still at the mercy of nature.
—Neil deGrasse Tyson

There are many ways to look at risk. One way is to divide risk into natural, accidental events and man-made intentional acts of aggression. Both types of risk are important, but there is some insight to be gained in looking at them differently. This chapter explores risk arising from natural and accidental threats.

Disaster Strikes

What is a disaster? What can go wrong when a random occurrence has far-ranging consequences? Consider this:

A transformer in a nearby substation explodes and an entire nearby business park goes dark. The medical service company there has its own generators, so their server room stays up. However, the business park is on a hill and the local water pumps are offline. The landlord closes all the offices because none of the bathrooms are functioning and the local board of health is citing a sanitation hazard.
A backhoe tears through an underground telecom conduit, breaking the primary T3 serving the nearby bank. Good news: the bank had a secondary Internet connection. Bad news: the second Internet connection ran through the same conduit as the primary. The bank is offline and the telecom company is digging furiously to expose the conduit to replace the broken lines. Then it starts raining. Hard. Now the hole is filling up with mud and water, slowing repair efforts.
Heavy snowfall blankets the hills and streets of a major metropolitan city. Then temperatures drop, freezing the snow into slick ice. No wheeled vehicle in the city can drive faster than ten miles per hour. Roadways clog up with stalled cars. At a large software development company, hundreds of employees jump onto the remote access VPN and promptly overload it. Only 50 can work at a time out of an office of 200. Work now breaks down into shifts with many personnel forced to work off their cell phones and locally stored files until the snowplows can clear the roads.
A thousand-year storm (now more common in the age of climate change) hits and flooding takes out not only the data center but also an entire company’s campus. It turns out that the campus was built in an ancient lakebed. Sewers are reversing and streets have become rivers. Even if they could work remotely, failing levees are forcing local employees to flee their homes.
A 6.8-magnitude earthquake hits just south of a major city. At one company’s data center, a large battery stack that was not properly bolted down pulls away from the wall by a few inches, breaking the main power coupling. The coupling is how all power flows into the server room. The emergency generator in the parking lot, despite being fully operational, is useless without a connection to the server room. Worse, outside feeds also come in over the same coupler. It’s a specialized part and none are available locally. The company needs a new coupler flown in but the earthquake damaged the airport runway so n othing can land.
A law firm is on the top floor of a prestigious downtown office building boasting an award-winning Venetian trattoria in the lobby. One evening, a fire breaks out in the kitchen of the restaurant. Fire fighters quickly douse the flames but they also close the building for days to inspect for damages. During that time, the power to the building is also turned off. The law firm servers go dark.
In 2012, Hurricane Sandy damaged hundreds of thousands of homes and shut down power to dozens of data centers all over the New York metropolitan area. Data centers switched over to generators, but generators require fuel and the fuel trucks couldn’t get through the flooded roads.

These are all natural disasters based on real events that significantly affected the organizations. Clearly, these kinds of risks are worth examining in more detail.

Risk Modeling

A risk model is nothing more than a taxonomy and a method of measurement that provides a picture of the likelihood and impact of potential damaging events . To be useful, a model needs to reflect reality as closely as possible. Therefore, you should choose the right risk model. One thing to consider is using different risk models for different kinds of risk.

You could use the same model for all of your risk calculations. Many knowledgeable risk analysts do, and come up great results. I want to illustrate a couple of different specialized models that I find useful. You may find the risk model that works best for your organization is entirely different. The most important thing is to think thoroughly about risk and build a practical methodology that is appropriate for your industry, threat landscape, and compliance requirements.

Let’s consider the following list of risks:

Earthquake
Denial-of-service
Fire
Fraud
Hazardous materials spill
Industrial espionage
Insider sabotage
Intellectual property theft
Landslide
System failure

Examining this list, you may see two kinds of threats here. There is the threat of natural hazards and accidents, where bad things just happen. There is also the threat of adversarial risk, where bad people make bad things happen, either on purpose or through carelessness. So, this risk list can be split into two columns, as shown in Table 4-1.

Table 4-1. Risks: Natural vs. Man- made

Bad Things Just Happen	Bad People Make Bad Things Happen
Earthquake	Denial-of-service
Fire	Fraud
Landslide	Industrial espionage
Hazardous materials spill	Intellectual property theft
System failure	Insider sabotage

Given the distinctive difference in likelihood and impact between these two kinds of threats, it might be more accurate to model these threats differently. The likelihood of a random event that causes damage, such as a tornado or a technology failure, manifests much differently than intelligent adversaries (such as cyber-criminals or malicious insiders ) who adapt their strategies to your defenses and choose when to attack. Likewise, the impact of natural disasters can sometimes be so vast and multifaceted that you might want to look at a more detailed impact model.

The next chapter is about adversarial threats, whereas the rest of this chapter is about natural/operational threats.

Modeling Natural Threats

In some ways, natural threats are easier to model than adversarial threats. Although there are still many unknowns, such as trying to predict earthquakes or weather, there is also a lot of scientific expertise and historical data to draw upon.

Like real estate, the three most important things about natural hazards are location, location, location. Location even factors in natural threats like pandemics, where large cities and regional travel hubs are at higher likelihood of outbreaks than sleepier locales. Location modeling is where your asset analysis comes in. To do any kind of natural risk analysis, you absolutely need to locate all the significant physical locations of your organization. With that, you can determine what infrastructure and hazards are present in each area. By infrastructure, I mean the following:

Connected utilities (power, water, sewer)
Communication providers serving that location and their entries into the building
Nearest fire department
Nearest major highway
Nearest major airport
Nearby waterways
Nearby industrial activity

Based on this, you can gather information about what can go wrong. There are many resources available for getting this kind of data, most from governmental agencies. One warning: sometimes the data can require some work to decipher. For example, where I live, there is the Seattle Hazard Explorer , which is an interactive map that provides data on a dozen different possible natural hazards (see www.seattle.gov/emergency-management/hazards-and-plans/hazards ).

It gives its results as a number range mapping of a “10% chance of exceeding in 50 years.” Reading around the site, you learn that the number range refers to the force of the acceleration generated by an earthquake in terms of gravitational constant (G). So a number 50 would mean shaking to add power equal to half the strength of normal Earth gravity. As a reference, at numbers higher than 30 (or more than 1.3Gs), you can expect some building damage to occur. This gives you some idea of impact. The 10% refers to the probability of such quake in a 50-year period, thus you have likelihood. So with a little work figuring out the numbers, you have enough for a quantitative risk analysis for earthquakes in Seattle. Many of these sites require this kind of decrypting to get data for a risk model.

Here is a list of other good resources for researching natural hazards for North America .

USGS Earthquakes maps
http://earthquake.usgs.gov/earthquakes/
USGS Natural Hazards map
http://www.usgs.gov/natural_hazards/
National Hurricane Center Storm Surge map for US Coast
http://www.nhc.noaa.gov/surge/
FEMA: Flood Hazard map
https://www.fema.gov/national-flood-insurance-program-flood-hazard-mapping
NOAA Severe Storm Labs
http://www.nssl.noaa.gov/
Power Outage Data Explorer
http://insideenergy.org/2014/08/18/data-explore-15-years-of-power-outages/

As you assess natural hazards, you quickly realize that here is a huge range of possible threats. There are many things to calculate and consider. Here is a large list of possible natural threats for you to consider.

Threat
Earthquake
Flood
Blizzard / ice-storm
Landslides/mudslides
Building fire
Forest fire
Heat wave
Hazardous material spill
Gas leak
Prolonged road closure
Pandemic
Blackout/ele ctrical disruptions
Solar flare
Volcano
Lightning
Tsunami
HVAC failure
Aircraft accident
Windstorm
Communications failure
Nuclear power mishap
Civil disturbance
Active shooter/domestic terrorism

You may notice that the last two bullet items are human-originated but still listed as natural hazards. In these kinds of threats, it’s easier to model them as random events rather than intentional attacks against your organization. This could change if your organization is one that is attractive to highly motivated attackers, like a law enforcement agency which could appeal to political or nation-state attackers which then skews the normal random distribution of Internet threats. Use your best judgement.

Modeling Impact with Failure Mode Effects Analysis

Operational impacts involving technology can be tricky because of the intricate dependencies involved. Consider a situation like a power surge in a single colocation facility that causes a single rack to fail. This rack holds dozens of servers, including the main external DNS server for an entire company. Coincidentally, the secondary DNS in another facility is undergoing a maintenance event. Now the entire company’s Internet presence is offline. Web, FTP, e-mail, and remote access are all down. Although this is an unusual chain of events, it still happens. Considering the severe impact of these kinds of problems, you might want to model these kinds of occurrences.

For operational risk analysis, consider looking at Failure Mode and Effects Analysis (FMEA ). FMEA is based on a US military procedure. The model was formalized for general use and is currently published as International Standard IEC 60812. FMEA is a risk analysis methodology that focuses on the ways that components in a system fail and the downstream effects of those failures. Although it can be time-consuming, it is a very systematic and easy method for a team to use to get a complete picture of how complex systems fail. You will quickly see alternate design ideas, fail-over mechanisms, and monitoring requirements as you walk through an FMEA analysis.

The essence of FMEA is to

Break down a complex system into its major functions.
Analyze the functions.
Determine the effects of the failure of each of the functions on the overall system.

Simple FMEA Example

Table 4-2 illustrates an example of how to break down an Internet banking system into functions.

Table 4-2. System Example: Internet Banking System

Subsystem	Functional Subsystems	Effect of Failure
Servers	Database server, app server, web server, DNS server	Immediate failure of entire system
Network Devices	Firewall, router, switch, cables	Immediate failure of entire system
Connectivity	ISP, local link, cables	Immediate failure of entire system
Facilities	Server rack, power, HVAC	Varies but assume near-term failure of entire system
Personnel	DBA, sysadmin, net engineer, programmer	Varies but assume failure of system within a few weeks

You can increase the availability of all the technological systems by adding redundant systems. However, adding more personnel and facilities is much more expensive. So let’s dig deeper into these two functions. In Table 4-3, we can add another FMEA factor, the detection method for a failing function.

Table 4-3. FMEA Example of Facilities

Function	Effect of Failure	Explanation	Detection
Server rack	Immediate failure of entire system	Need rack to hold server	Beyond systems failing, none
Power	Immediate failure of entire system	Need electricity	Audible alarm in room on batteries
HVAC	Failure of system in 1–6 hours, plus possible equipment damage	Servers overheat	Thermometer alarm via SMS pager

In Table 4-4, we can add another layer of impact by looking at how personnel are affected by function failures.

Table 4-4. Example of FMEA Breakdown of Personnel

Function	Effect of failure	Explanation	Detection
Database admin	Failure of system within 30–60 days	Database logs fill up without admin to clear them	E-mailed alarms to admins
Sysadmin	Degraded performance, failure of system within 15–30 days	No maintenance by admins	User complaints
Net engineer	Degraded performance	No network fixes or optimization	User complaints
Programmer	Degraded performance	No bug fixes or features added	User complaints

As you can see, there are some problems with delayed failures caused by functions not being available. Coupled with poor detection capability, this could mean a serious problem emerging when you are least able to deal with it, like in the middle of the night. Anyone who works in IT will probably cringe when they read that the detection method is user complaints.

Breaking down a System

The first step is to break down a system into its primary functions. You can represent most systems, especially technological ones, as a hierarchy. In addition, large systems are built to serve a purpose. This is your beginning point: the primary mission of the system. From here, functional requirements will flow. Examine its structure as it fulfills its functions. You may want to look at how it performs those functions over time, considering environment changes on a scale of hours, days, weeks, and so forth. For example, an accounting system may have different functions and dependency cycles for month-end, quarterly profit reporting, and annual tax analysis. Remember to use your asset analysis and review those diagrams, documentation, data flows, and subject-matter expert interviews.

You should consciously decide and document what you consider to the boundaries of the system. This is useful for analysis and will come in handy in Chapter 6 when we discuss the concept of scope. What is a system? What should you include? What you should you leave out? For example, a typical corporate Windows workstation needs dynamic addressing, Active Directory, name services, clock synchronization, and a local area network. Maybe you can leave out authentication services, file sharing services, and Internet links. Maybe you should include Internet connectivity? It all depends on your business requirements.

When looking at larger systems, functions can be more than just components or technical services. You can look at things like facilities, personnel, ISPs, and other major systems. For example, here is a functional breakdown of a retail store’s sales system:

Point-of-Sale (POS) terminals
Receipt printers
Card readers
Sales clerks
Store network
Store wiring closet
Store server rack
Store POS server
Store network link to headquarters in another city
Regional IT service technician (part-time)

This method is called the top-down approach and it is the most common way people approach an already existing system. You start big and work your way down into smaller and smaller pieces, stopping analysis when you feel that you understand enough. Another approach is bottom-up, where you start with components and build up to a complete system. This takes longer but is more complete.

Analyzing Functions

With the major functions defined, you can begin analyzing those functions for how they will fail. This means looking at dependencies, redundancies, inputs, and outputs. Like the overall system, you should begin with the goal of the function. How does the function fulfill its mission? What was the mindset of the designers of the function? These things can provide overlooked clues that can help you find shortcomings and possible flaws in the function.

You should also consider how this system feeds into other systems. In the workstation example, that system could be considered part of the “marketing subnet. ” Are there any implications and feedback loops because of that relationship? Perhaps the marketing department does monthly video broadcasts; so now you realize that each workstation should also have speakers or headphones as critical functions.

Things to look at when analyzing a function:

Command inputs (the range of possible changes that can be made)
Breadth of command inputs (How many external parties can issue commands to the function?)
Data flows (size, speed, and path)
Internal feedback mechanisms (Can the function monitor its own status? How does it react?)
External feedback mechanism (Can the function receive warnings from the outside? How does it react?)
Does the function provide feedback to outside systems? How and how often?
What does the function require from outside systems to function? What happens when it doesn’t get it?
How does the function handle load? What does it do when overloaded? Idle?
Does the function adjust its own parameters? How?
How long can the function run without rest/maintenance? External adjustment?

Determining Failure Effects

Modeling failures within complex systems full of interacting components is a challenge. Large systems can be non-linear, where a minor change in a single subsystem can resonate with large consequences. Systems also have a history, as they have evolved and been adapted from earlier, simpler systems. Over time, their purpose may have changed. Sometimes that legacy of changes is reflected in the components in the system. For systems already in place, it is important to perform analysis on systems, as they exist, not based on the original specifications or idealization of plans yet to be implemented.

When brainstorming failure modes, you should consider exactly what that failure would look like. Be sure to take into account the scale (how bad) and the duration (how long). For some kinds of systems, scale and duration could be limited or prolonged depending on how the system reacts and recovers from problems. In some cases, disruption is momentary and evades detection. I have seen more than a few cases where a server quietly crashes, reboots, and comes back up all between the cycles of a five-minute uptime check of the monitors. Users, however, did notice because all of their transactions failed and data was lost. Improving function failure detection is an important part of failure effects analysis .

Drawing a system can also help illustrate which failures modes can be the most catastrophic to a system. Figure 4-1 is a simple diagram of an Internet banking web farm.

Figure 4-1. Internet banking system

Can you spot the single point of failure in the example? Hint: it’s probably the cheapest piece of equipment in the entire stack. Drawing a system on a whiteboard during a group brainstorming session can be very effective; not only for spotting failure points, but also in creating general understanding on how the system actually functions.

Business Impact Analysis

If you are responsible for the organization’s business continuity plan, then this type of analysis is useful for the Business Impact Analysis (BIA) section of the business continuity plan . Business continuity is a requirement of ISO 27001, PCI DSS, and SSAE 16 SOC 2 and 3. Sometimes this role falls on IT security and sometimes the role falls on a dedicated business continuity team. In any case, it’s essential to understand the process.

BIA identifies the functions that are necessary to the organization and what the effects of an interruption would be. Through BIA, you create disaster impact scenarios based on those mission-critical services to help determine what resources you might need to get business operations going again. This all fits very nicely into the risk analysis methods discussed so far in this chapter.

A way to enhance this analysis, especially for disaster scenarios, is to break down duration of unavailability or degradation of function. For the following examples, I have divided the period of unavailability into three times: one day, three days, and five or more days. These times allow a response plan to address specific kinds of interruptions, but also can work with an escalating crisis. You can use whatever periods you think are appropriate for your organization. Table 4-5 shows one way of categorizing different downtime periods for a typical office.

Table 4-5. Example of Impacts to Normal Business Operations for the Office

Function	Duration: 1 day down	Duration: 3 days down	Duration: 5+ days down
Water & Sewer	Minor	Significant	Major
HVAC	Minor	Significant	Significant
Electricity	Significant	Major	Major
Elevator	Minor	Minor	Minor
Connectivity (Phone & Net)	Significant	Major	Major

One thing you may notice from the example in Table 4-5 is that I didn’t define specific threats; just functional interruptions. I used FMEA is to treat offices services as functions and combined threats into a basic assumption of an interruption of service, regardless of reason, with a defined area of impact. You can use this shortcut with the FMEA model so that you do not need to look at every possible threat. This automatically rolls up the threats of fires, cable cuts, blackouts, pipe breaks, and severe storms into a single threat vector. For example, you could focus on an electrical outage and not worry about if it was caused by a backhoe accident, windstorm, regional blackout, or an earthquake. All you focus on is the loss of the function and effects. You can also use FMEA and ignore all the various types of threats and group common likelihoods together. For the BIA, you can also break down into likely durations; so in your scenario planning, you can just focus on a function interruption event of specified duration. Table 4-6 shows more examples of rolling natural threats and likely durations into an analysis table.

Table 4-6. An Example of Threat Mapping with FMEA to Generalized Failure Threats

Threat	Effects on Services	Outage Duration	Likelihood
Earthquake	Regional, all services	5+ days	1-Rare
Large earthquake	Widespread, all services	5+ days	0- Very Rare
Flood	Regional, all services	2–5 days	3-Unlikely
Blizzard/ice storm	Regional, transport	2–5 days	5-Possible
Storm	Regional, power & com & transport	2–5 days	5-Possible
Building fire	Facility, all services	5+ days	3-Unlikely
Civil disturbance	Facility, transport	2–5 days	1-Rare
Hazardous spill	Facility, all services	2–5 days	1-Rare
Pandemic	Regional, personnel	5+ days	3-Unlikely
Bomb threat	Facility, all services	1 day	3-Unlikely
Blackout	Regional, power	2–5 days	Possible
Volcano	Regional, all	5+ days	1-Rare
Lightning	Facility, power & comm.	2-5 days	1-Rare
HVAC failure	Facility, equipment	2–5 days	5-Possible
Aircraft accident	Facility, all services	5+ days	1-Rare
Communications loss	Regional, comm.	1 day	5-Possible
Nuclear mishaps	Regional, all	5+ days	1-Rare
Sabotage/terrorism	Facility, all services	5+ days	3-Unlikely

Here’s how you can use this mapping—along with combinations of outages—to get a simpler breakdown of the various impacts to business operations that a disaster would create. In this example, I model a service company with two customer-facing services: customer support calls and consulting. Of the two, the call center is the most important because it facilities immediate customer communication. If a customer calls and the lines are down, the company reputation sinks. Consulting services can be delayed but call answering cannot. In this example, most of the offices are in Nevada, except the sales offices, which are in Chicago. These are the facilities :

Main office: Las Vegas, NV
Sales office: Chicago, IL
Call center: Reno, NV
Call center: Las Vegas, NV

In Table 4-7, I break down the first cut of the outage scenarios for this example.

Table 4-7. Sample Scenario Overview

Scenario	Example	Likelihood	Impacts Critical service: Customer call center	Other Impacts
Scenario 1: All services in the state of Nevada heavily damaged	Major earthquake	Very rare	Major: Complete outage	Major: Corporate outage, sales office becomes primary contact
Scenario 2: Service outages in the Vegas area	Storms, flooding	Possible	Minor: Reno call center takes all calls; capacity reduced	Minor: Corporate outages; sales office becomes primary contact
Scenario 3: Personnel in the Vegas area are disabled	Pandemic	Rare	Minor: Reno call center takes all calls; capacity reduced	Minor: Corporate outages; sales office becomes primary contact
Scenario 4: Main office is unavailable	Building fire	Possible	No direct impacts	Minor: Corporate outages; sales office becomes primary contact
Scenario 6: Reno call center is unavailable	Building fire	Possible	Minor: Vegas call center takes all calls, capacity reduced	No additional impacts
Scenario 7: Vegas call center is unavailable	Building fi re	Possible	Minor: Reno call center takes all calls, capacity reduced	No additional impacts
Scenario 8: Chicago consulting office is unavailable	Building fire	Possible	No direct impacts	Minor: Corporate takes over sales calls as needed

As you can see in this example, things seem to be pretty well in hand except for a statewide disaster scenario. In this case, the risk has been identified and leadership can choose to mitigate that risk or accept it.

Documenting Assumptions

Whenever you work on these kinds of analyses, there will be assumptions. You need to identify them as you go along. Do not let them be invisibly baked into the resulting analysis. The results can be misleading if the assumption turns out to be false or misunderstood. The easiest way to avoid this is to solicit multiple perspectives in the analysis. Once you have identified the assumptions, they to need to be documented along with the analysis. You should also have leadership review and approve these assumptions when you present your analysis. They may have different assumptions that could change your results. Here are some sample assumptions to go along with the earlier examples :

Unless otherwise noted in the analysis, all staff can telecommute to perform their job functions with at least 75% efficiency.
Services hosted outside of the state remain available and functional. They are considered “always available” during an emergency due to the low probability that both services are unavailable at the same time. External hosted services include chat (cloud-based), accounting (outsourced), HR payroll (outsourced).
IT personnel suff er no more than 25% incapacitation in an interruption event.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
4. Risk Analysis: Natural Threats

4. Risk Analysis: Natural Threats

Disaster Strikes

Risk Modeling

Table 4-1. Risks: Natural vs. Man- made

Modeling Natural Threats

Modeling Impact with Failure Mode Effects Analysis

Simple FMEA Example

Table 4-2. System Example: Internet Banking System

Table 4-3. FMEA Example of Facilities

Table 4-4. Example of FMEA Breakdown of Personnel

Breaking down a System

Analyzing Functions

Determining Failure Effects

Figure 4-1. Internet banking system

Business Impact Analysis

Table 4-5. Example of Impacts to Normal Business Operations for the Office

Table 4-6. An Example of Threat Mapping with FMEA to Generalized Failure Threats

Table 4-7. Sample Scenario Overview

Documenting Assumptions

Further Reading

Table of Contents for 4. Risk Analysis: Natural Threats

Create new playlist

Sign In

Sign Up

4. Risk Analysis: Natural Threats

Disaster Strikes

Risk Modeling

Table 4-1. Risks: Natural vs. Man- made

Modeling Natural Threats

Modeling Impact with Failure Mode Effects Analysis

Simple FMEA Example

Table 4-2. System Example: Internet Banking System

Table 4-3. FMEA Example of Facilities

Table 4-4. Example of FMEA Breakdown of Personnel

Breaking down a System

Analyzing Functions

Determining Failure Effects

Figure 4-1. Internet banking system

Business Impact Analysis

Table 4-5. Example of Impacts to Normal Business Operations for the Office

Table 4-6. An Example of Threat Mapping with FMEA to Generalized Failure Threats

Table 4-7. Sample Scenario Overview

Documenting Assumptions

Further Reading

Table of Contents for
4. Risk Analysis: Natural Threats