Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 10
Where Even the Quants Go Wrong: Common and Fundamental Errors in Quantitative Models

There is perhaps no beguilement more insidious and dangerous than an elaborate and elegant mathematical process built upon unfortified premises.

—THOMAS C. CHAMBERLAIN, GEOLOGIST (1899)

In theory there is no difference between theory and practice. In practice, there is.

—YOGI BERRA

When it comes to improving risk management, I admit to being a bit biased in favor of quantitative methods in the assessment, mitigation, or deliberate selection of risks for the right opportunities. I think the solution to fixing many of the problems we've identified in risk management will be found in the use of more quantitative methods—but with one important caveat. In everything I've written so far, I've promoted the idea that risk management methods should be subjected to scientifically sound testing methods. Of course, we should hold even the most quantitative models to that same rigor. They get no special treatment because they simply seem more mathematical or were once developed and used by highly regarded scientists. Even though in the previous chapter I criticize a lot of what Nassim Taleb says, this is where we are in total agreement.

The idea that the mere use of very sophisticated-looking mathematical models must automatically be better has been called crackpot rigor (recall the discussion of this term in chapter 8) and a risk manager should always be on guard against that. Unfortunately, the rapid growth in the use of sophisticated tools has, in many cases, outpaced the growth in the skills to use these tools and the use of questionable quantitative methods seems to be growing out of hand.

Science has evolved to its current advanced state because some scientists have always questioned other scientists, and even widely accepted scientific ideas could never, in the long run, withstand contradictory findings in repeated, independent tests. This chapter is an attempt to apply the same skepticism we've applied to the risk matrix to more quantitative methods in risk assessment and risk management.

A SURVEY OF ANALYSTS USING MONTE CARLOS

Remember, the one-for-one substitution model mentioned in chapter 4 is an example of a Monte Carlo simulation. It is just a very limited form of a Monte Carlo simulation focusing on risks. But Monte Carlos can address questions beyond disastrous losses and consider uncertainties about the opportunities in decisions such as whether to pursue a new product or increase the capacity of a factory. We can apply them to entire portfolios of software projects, R&D, and other investments so that we can prioritize them on the basis of risk and return. To see an example of how a Monte Carlo can be applied to another category of decisions, go to www.howtomeasureanything.com/riskmanagement to download an Excel-based simulation applied to a capital investment decision.

PC-based Monte Carlo simulation tools have been widely used for decades. The Monte Carlo tools @Risk and Crystal Ball, were released in the late 1980s, while I was still writing bespoke Monte Carlo simulations using Fortran, BASIC, and even the Lotus 123 spreadsheet. By the 1990s Excel-based versions of these tools were introduced and the user base was growing quickly.

As the sales of these tools were increasing, so was the possibility of misuse. In 2008, I conducted a survey to see if this was the case. Using a list of simulation conference attendees, I recruited thirty-five users of @Risk and Crystal Ball for a survey of their work. Each of these users was asked for details about the last one to three Monte Carlo models they constructed. Data for a total of seventy-two individual Monte Carlo models was gathered (an average of just over two models per user).

The modelers in the survey claimed to be fairly experienced on average. The average years of experience were 6.2 and the median was just 4 years (this skewed distribution is due to the fact that a few people had over 15 years of experience but there were many at the bottom end of the scale). Most of the models were not terribly complicated—73 percent had fewer than fifty variables. Those surveyed worked on a variety of Monte Carlo applications including the following:

Business plans
Financial portfolio risks
Sales forecasts
Information technology projects
Mining and oil exploration
Pharmaceutical product development
Project schedule and budget forecasts

Engineering and scientific models such as radar simulations and materials strength
Competitive bidding
Capital investments in the steel industry
Alternatives analysis on supply chains and inventory levels
Building construction risks
Product variations in manufacturing

I asked them questions about where they got their data from and about quality control on their models. Here is what I found:

There were a lot of subjective estimates but no calibration of probabilities. An overwhelming majority of those surveyed—89 percent—used some subjective estimates in models. On average, the percentage of variables in all models that were subjective estimates was 44 percent. However, not one of the modelers ever used—or had even heard of—calibration training. As discussed in chapter 7, this would mean that almost all estimates were overconfident and all of the models understated risks. In fact, the most uncertain variables that had bigger impacts on the output were the most likely to rely on expert estimates.
When I asked about validating forecasts against reality, only one respondent had ever attempted to check actual outcomes against original forecasts. This question produced a lot of hand-waving and carefully qualified statements from the other respondents. The one person who claimed he did do some validation of original forecasts could not produce the data for it. Instead, he offered only anecdotal evidence.
Although 75 percent of models used some existing historical data, only 35 percent of the models reviewed used any original empirical measurements gathered specifically for the model. Furthermore, only 4 percent ever conducted an additional empirical measurement to reduce uncertainty where the model is the most sensitive. In contrast, I find that, based on sensitivity analysis and computing the value of further measurements, all but three of the over 150 models I've personally developed in the last twenty years required further measurement. It appears that most modelers assume that they can model only on subjective estimates and the existing data they are given. The idea of conducting original empirical research is almost completely absent from Monte Carlo modeling.

I could have made the sample in this survey a much larger sample if I had included the clients I had trained myself, but I didn't want to bias the sample by including them. However, when I asked those I trained about models they built before I met them, the results were the same. Of those who did build quantitative models based even partly on subjective estimates, none had used calibration training in their organizations prior to the training they did with us. This includes large groups of economists and statisticians who routinely performed Monte Carlo simulations on a variety of critical government policy decisions and major technology and product development projects.

Since this first survey in 2008, I've conducted other surveys regarding project management, cybersecurity, and enterprise risk management (ERM). I introduced some of the findings of these surveys in chapter 2. In each of these cases I asked whether they were using Monte Carlo simulations or other quantitative methods, such as Bayesian networks, statistical regression, and so on. Of the minority of those in project management who use Monte Carlos, I still find that subject matter experts are the main source of input but that calibration training still does not occur most of the time (only one in six Monte Carlo users in project management said they used calibrated estimators).

Again, the subsequent research showed that the adoption of quantitative tools may be outpacing the knowledge of how to use them correctly. Recall that in the 2015 cybersecurity survey we asked respondents a set of questions regarding basic literacy in probabilistic methods. Those who said they used Monte Carlo tools, Bayesian networks, or historical regression did not, on average, perform better than other respondents on statistical literacy questions.

We also see some indications that the concept of statistical significance, as explained in the previous chapter, may be misapplied in quantitative decision models. Although we didn't ask this question directly in any of the surveys, some of those who said they used sampling methods and controlled experiments to inform their models indicated in free-form responses how they sought measures of statistical significance. This is worth further investigation, but it agrees with some anecdotal observations I've made in consulting as well as the observations of many statisticians I met at the American Statistical Association (ASA). Statistical significance alone cannot tell us, for example, the probability that a new drug works or that a new training program improves customer satisfaction. As explained in the last chapter, a result at the .05 significance level is not the same as a 95 percent chance that the hypothesis is true, but at least some quantitative models appear to be using the results of significance tests this way.

Of course, we shouldn't necessarily extrapolate findings in a cybersecurity survey to other fields, and we certainly shouldn't conclude too much from anecdotes and a few free-form responses in surveys. But we should at least consider the possibility that the lack of a conceptual foundation to use popular quantitative tools properly is a problem in project management, finance, supply chain, and other areas of risk assessment. Just because someone is using a tool doesn't mean what they build with it makes sense.

THE RISK PARADOX

Jim DeLoach at the risk management consulting firm Protiviti observed, “Risk management is hopelessly buried at the lowest levels in organizations. I see risk analysis focus on the chance that someone will cut off a thumb on the shop floor.” The paradox in risk management that I've observed anecdotally over the years, and that seems to be observed by many risk experts I talk to, is that there is a significant misalignment between the importance of a risk and the amount of detailed, quantitative analysis it receives.

My standard anecdote for this risk paradox comes from the 1990s when I was teaching a seminar on my applied information economics (AIE) method to what I thought was an audience of chief information officers (CIOs) and IT managers. I asked if anyone had applied Monte Carlo simulations and other quantitative risk analysis methods. This was almost entirely a rhetorical question because I had never seen anyone raise a hand in any other seminar when I asked that question. But this time, one manager—from the paper products company Boise Cascade—raised his hand.

Impressed, I said, “You are the first CIO I've ever met who said he used Monte Carlo simulations to evaluate risks in IT projects.” He said, “But I'm not a CIO. I'm not in the IT department, either. I analyze risks in paper production operations.” I asked, “In that case, do you know whether they are used in your firm on IT projects?” He responded, “No, they are not used there. I'm the only person doing this in the firm.” I then asked, “Which do you think is riskier, the problems you work on, or new IT projects?” He affirmed that IT projects were much riskier, but they received none of his more sophisticated risk analysis techniques. Here are just a few more examples of the risk paradox:

As mentioned in chapter 3, Baxter, similar to many other pharmaceutical companies, uses quantitative risk models on stop-gate analysis—the assessment of the decision to move ahead to the next big phase in the development of a new product. The reason sophisticated methods are justified for that problem is the same reason they are justified in oil exploration—it is a large capital outlay with a lot of uncertainty about the return. But legal liabilities, like the heparin case described in chapter 3, may turn out to be much larger than the capital investments in the next phase of a new drug.
During and before the 2008 financial crisis, banks that routinely did some quantitative risk analysis on individual loans rarely did any quantitative risk analysis on how economic downturns or system-wide policies would affect their entire portfolio.
Long-term capital management (LTCM) used the Nobel Prize–winning options theory to evaluate the price of individual options, but the big risk was the extent of the leverage they used on trades and, again, how their entire portfolio could be affected by broader economic trends.
Insurance companies use advanced methods to assess the risks accepted by insurance products and the contingent losses on their reserves, but major business risks that are outside of what is strictly insurance get little or none of this analysis—as with AIG and their credit default swaps.
There are some risk analysis methods that have been applied to the risks of cost and schedule overruns for IT projects, but the risks of interference with business operations due to IT disasters are rarely quantified. A case in point is the enterprise resource planning (ERP) system installed at Hershey Foods Corp. in 1999. Meant to integrate business operations into a seamless system, the ERP project was months behind and the cost ran up to $115 million. They attempted to go live in September of that year but, over their all-important Halloween season, they were still fixing problems with order processing and shipping functions of the system. Business was being lost to competitors and they posted a 12.4 percent drop in revenue. This risk was much greater than the risk of the ERP cost overrun itself.

This sequestration of some of the best risk analysis methods causes problems with the further evolution of risk management. The relative isolation of risk analysis in some organizations means that different analysts in the same organization may work in isolation from each other and build completely inconsistent models. And the lack of collaboration within firms makes another important step of risk management almost impossible—a cooperative initiative to build models of industrial economics and global risks across organizational boundaries (more on that in the last chapter of the book).

Too Uncertain to Model Quantitatively?

Perhaps the risk paradox is partly a function of some persistent confusion I mentioned in previous chapters. There is a culture among some otherwise-quantitative modelers of excluding things from risk analysis because they are uncertain.

I once spoke with the head of a university interdisciplinary group, which she called a collaborative for modeling risks. She mentioned that this particular issue caused strain among the team. The director of this interdisciplinary effort described some of her frustration in dealing with what she called the modelers. She explained that “modelers are saying that because we can't estimate the actions of people, we have to leave those variables out.” I thought this was odd because as a modeler I routinely include so-called “people variables.” When I model the risk and return of implementing some new information technology in a firm, I often have to include uncertainties such as how quickly users will begin to effectively use the new technology. I've even made models that include uncertainties about whether particular bills would pass in Congress or the action of Saddam Hussein's forces in Iraq.

I came to find that when she said modeler she was talking about a group of bridge construction engineers who, for some reason, put themselves in charge of building the Monte Carlo simulations for the risks the group assessed. To the engineers, only variables about the physical parameters of the bridge seemed real enough to include. Those whom the director referred to as the modelers and non-modelers didn't talk to each other, and there were people on her staff who had what she called a professional divorce over this. In her world, modelers are typically coming from engineering and hard sciences, and non-modelers are coming from political science, sociology, and so on. Non-modelers are arguing that you have to put in the people variables. Modelers are saying that because (they believe) they can't measure the people variables, they have to leave them out of the model. The modelers are saying the important things are the tensile strength of materials and so on.

This presents two important issues. First, why would one group of subject matter experts (SMEs) presume to be in charge of building the Monte Carlo model as opposed to some other group of SMEs? I would generally see engineers as just one other type of SME involved in a modeling issue that requires multiple types of SMEs. But more to the point, why leave something out because it is uncertain? The whole point of building a Monte Carlo model is to deal with uncertainties in a system. Leaving out a variable because it is too uncertain makes about as much sense as not drinking water because you are too thirsty.

A similar exclusion of variables that are considered “too uncertain” sometimes happens in models made for oil exploration. When analysts estimate the volume of a new oil field, they build Monte Carlo simulations with ranges for the area of the field, the depth, the porosity of the rock, the water content, and so on. When they run this simulation, they get a range of possible values for how much oil is in the field. But when it comes to modeling one of the most uncertain variables—the price of oil—they sometimes don't use ranges. For the price of oil, they may use an exact point.

The reason, I've been told, is that the geologists and scientists who run the Monte Carlos are either too uncertain about the price or that management simply gives them an exact price to use. But this means that when management is looking at the analysis of an oil exploration project, they really aren't looking at the actual risks. They are looking at a hybrid of a proper risk analysis based on ranges and an arbitrary point estimate. They undermine the entire purpose of the Monte Carlo.

They further violate the output of a Monte Carlo in other ways. Sometimes, analysts producing Monte Carlos are told to collapse their perfectly good distributions to a single point for “accounting purposes.” You can't give them the range that represents your uncertainty—so you're told—so you have to pick one number. If you have an oil field that has somewhere between two and six billion barrels, should you tell the investors it has exactly four billion?

The executives know that the cost of overestimating the amount of oil in their reserves can be much higher than the cost of underestimating reserves. So, because they would rather underestimate than overestimate, they tend to pick a number that's in the lower end of the range. Steve Hoye, prior to starting his job at Crystal Ball, was in the oil business for twenty years starting in 1980. As a young geophysicist, he saw this firsthand. In an email communication with me, he pointed out other incentives that affect how distributions are converted to points:

There are benefits to underestimating and sometimes serious consequences for overestimating. Shell had a 20 percent write down in 2004 on their reserves. They had to go back to investors and tell them they didn't have as much reserves as they thought. It's a great study in the cost of being wrong. In Texaco, where I worked, they had a big write down in the 1970s and senior management was reshuffled as a result.

The tendency to conservatively underestimate is, therefore, understandable. But now imagine every manager is doing this for every oil field. One study found that, because of the practice of systematically converting distributions to conservative points and then adding the points together, oil reserves are systematically underestimated.¹

Usually, big oil does a good job of quantifying risks. As Hoye puts it, “There are enormous risks in oil, huge capital outlays, and multiple years before a payoff. The success rates are one-in-eight in exploratory projects. And that's the good rate.” But the strong incentive to model risks well may be undercut when the results are communicated. “Oil companies are dealing with an asset they cannot touch, but they have to make public pronouncements of the value of these assets,” says Hoye. Perhaps the best way to deal with it is to share the actual uncertainty of the distribution with investors. A range has a chance of being right whereas a point estimate will almost always be wrong.

John Schuyler with Decision Precision is another longtime expert at Monte Carlo simulations for oil exploration who sometimes sees odd hybrids between deterministic and stochastic models. He observes, “Many people might run a Monte Carlo, take the mean and put it in a deterministic model or reduce [the ranges] to ‘conservative’ or ‘optimistic’ points … All of this is coming up with a horribly contrived result.” Schuyler adds, “All that good Monte Carlo simulation upstream is kind of going to waste.”

This attitude of excluding uncertainties because they are too uncertain is pervasive in many industries. In mid-2008, I had a lengthy discussion with an economist who made a living doing business case analyses for luxury home developments. He indicated that his business was not going to be affected much by strains on the mortgage system because, he claimed, the high-end houses and second home market were less affected.

Although he was familiar with Monte Carlo simulations, I was very surprised to learn that, even with the risks and uncertainties in real estate, he conducted a deterministic, fixed-point analysis of the developments. I said risk has to be a major component of any real estate development investment and he would have to include it somehow. His position was that it would be too difficult to determine ranges for all the variables in his models because he just didn't have enough data. He saw no fundamental irony in his position: because he believed he didn't have enough data to estimate a range, he had to estimate a point.

This is based on the same misconception about precision and probabilities I discussed regarding scoring models. If modelers exclude something because it is more uncertain than the other variables, they will invariably exclude some of the most important sources of risks in a model. Taken to the extreme, some analysts exclude probabilistic models altogether and choose models based on point estimates. Until we can begin to see that probabilistic models are needed exactly because we lack perfect information, we will never be able to conduct a meaningful risk analysis.

“Too Unique” to Model Quantitatively?

Obviously, “too unique” or “very unique” are oxymorons, because something is either unique or not and there cannot be levels of uniqueness. But such terms are invoked to make a case for another imagined obstacle to the use of more empirical methods. It is the belief that separate events are so unusual that literally nothing can be learned about one by looking at another. It is said that each IT project, each construction project, each merger is so special that no previous event tells us anything about the risks of the next event. This would be like an insurance company telling me that they cannot even compute a premium for my life insurance because I'm a completely unique individual.

Although we know insurance companies don't let that stand in the way of good risk analysis, many other fields are not immune to this misconception. Even some scientists, such as geologists who study volcanoes (volcanologists), seem to have woven this into their culture. This is the very reason why the risk of a catastrophic eruption of Mount St. Helens in 1980 was ignored by volcano experts. (See The Mount St. Helens Fallacy box.)

THE MOUNT ST. HELENS FALLACY

Fallacy: If two systems are dissimilar in some way, they cannot be compared. In effect, this fallacy states that if there are any differences at all, there can be no useful similarities.

On May 18, 1980, Mount St. Helens in the Cascade Range of Washington State exploded, in the most destructive volcanic event in US history. More than fifty people were killed, 250 homes were destroyed, and more than two hundred square miles of forest was leveled.

Prior to the eruption, rising magma had formed a bulge on the north side that protruded so far it became unstable. At 8:32 a.m. the huge bulge slid off and uncorked the magma column resulting in a lateral eruption (meaning it erupted out of one side).

Scientists who previously studied the volcano found no geological evidence that a large lateral explosion had ever occurred on Mount St. Helens before … and therefore ignored the possibility. As US Geological Survey geologist Richard Hoblitt stated, “Before 1980, the volcanic-hazards assessments for a given volcano in the Cascade Range were based on events that had previously occurred at that volcano. The 1980 directed blast showed that unprecedented events are possible and that they need to be considered.”²

Scientists had to ignore even the basic physics of the system to conclude that, because it had not happened before, it could not happen now. There is no way the bulge could have been stable and so a landslide had to release pressure. Hopefully, unprecedented events are now considered systematically, but that message has not sunk in for everyone. In a Discovery Channel special on volcanoes, one volcanologist said, “No two volcanoes are exactly alike. So in order to study a volcano you really have to study the history of that volcano.” Put another way, this is Taleb's turkey—by looking at that turkey it would never be apparent that it's about to be killed. We only know this from looking at the history of other turkeys.

But, actually, looking at other volcanoes does tell us something about a particular volcano. If not, then what exactly is the expertise of a volcano expert? I might also call this the fallacy of close analogy. That is, the belief that unless two things are identical in every way, nothing learned from one can be applied to the other. Think about it: our experience is almost always based on learning fundamental principles learned from situations that were not exactly identical to situations we apply them to. If we cannot infer fundamental principles that apply to many things by observing a few less-than-perfectly-identical things, then none of our experience matters.

Indeed, from a risk analysis point of view, volcanoes not only have something in common with other volcanoes but also they have a lot in common with completely different categories of events. As we will see next, there is something in common even among diverse events such as forest fires, power outages, wars, and stock markets.

FINANCIAL MODELS AND THE SHAPE OF DISASTER: WHY NORMAL ISN'T SO NORMAL

I spent part of the last chapter describing what I believe to be errors in the argument of Nassim Taleb, but I also pointed out that he is absolutely correct on other key points. He points out (as have quite a few economists) that many of the most lauded tools in finance, especially some of those that won the Nobel, are based on some assumptions that we know to be false by simple observation. Nobel Prize–winning theories such as options theory and modern portfolio theory assumed (at least initially) a particular distribution of potential returns and prices that make extreme outcomes appear much less likely than we know them to be. These theories use an otherwise-powerful tool in statistics and science known as a Gaussian or normal probability distribution, such as the one shown in exhibit 10.1.

The normal distribution is a bell-shaped symmetrical probability distribution that describes the output of some random or uncertain processes. The bell shape means that outcomes are more likely to be near the middle and very unlikely at the tails. The shape can be fully described by just two dimensions—its mean and standard deviation. Because this is a symmetrical and not a skewed (i.e., lopsided) distribution, the mean is dead center. The standard deviation represents a kind of unit of uncertainty around the mean.

A bell-shaped curve depicting the Gaussian or normal probability distribution that describes the output of some random or uncertain processes. — **EXHIBIT 10.1** **The Normal Distribution**

SIMPLE RANDOM SURVEY EXAMPLE USING A NORMAL DISTRIBUTION

If you haven't thought about normal distributions lately, here is a very simple example. The math for this and a more detailed explanation can be found at http://www.howtomeasureanything.com/riskmanagement.

Normal distributions are useful for showing the error in a random sample. If a random survey of commuters says drivers on average spend twenty-five minutes each way in a commute to work, then that means the average of all the responses in the survey is twenty-five minutes. But to determine how far from reality this survey result could be, the standard deviation of the error of the estimate is computed, which can then be used to determine a range with a given confidence. To communicate how far off the mean of the survey might be from reality (which he could only know if one surveyed all commuters), the statistician often computes a confidence interval (CI)—a range that probably contains the real average of the entire population of commuters.

The width of this range is directly related to the standard deviation. Let's say the statistician determines the standard deviation of the error of the estimate is two minutes and that he decides to show this range as a 90 percent confidence interval (meaning he is 90 percent confident that the range contains the true answer). By referencing a table (or an Excel function), he knows the upper bound of a 90 percent CI is 1.645 standard deviations above the mean and that the lower bound is 1.645 standard deviations below the mean. So he computes a lower bound of 25 − 1.645 × 2 = 21.71 and an upper bound of 25 + 1.645 × 2 = 28.29. Voilà, the survey shows with 90 percent confidence that the average commute is between 21.7 minutes and 28.3 minutes.

The normal distribution is not just any bell shape, but a very particular bell shape. If you are saying that the distribution of hits around a bulls-eye of a target on a firing range is normal, then you are saying that—after a sufficiently large number of shots—68.2 percent of the shots land within one standard deviation of the center, 95.3 percent are within two standard deviations, 99.7 percent are within three, and so on. But not all distributions exactly fit this shape. To determine whether a normal distribution is a good fit, statisticians might use one of the mathematical goodness-of-fit tests, such as the Chi-square test or the Kolmogorov-Smirnov test (K-S test). This test may be a popular, but probably irrelevant, for distributions used in risk analysis.

The main concerns for risk analysts are at the tails of the distributions, and some goodness-of-fit tests, such as the K-S test, are insensitive to how “fat” the tails are. I mentioned in chapter 9 how this might not work well. If we apply the normal distribution to Dow Jones daily price changes from 1928 to 2008, we would find a standard deviation of about 1.157 percent for a daily price change (as a percentage relative to the previous day). Because the bounds of a 90 percent CI are 1.645 standard deviations away from the mean, that means that about 90 percent of daily price changes would be within 1.9 percent of the previous day. In reality, about 93 percent fell within this range, but it's close. But when we get further out from the average trading day, the normal distribution drastically understates the likelihood of big drops. As first mentioned in chapter 9, a normal distribution says that a 5 percent price drop from the previous day should have had a less than 15 percent chance of occurring even once during that eighty-year period whereas in reality, it happened seventy times.

But, because the K-S test focuses on the main body of the distribution and is insensitive to the tails, an analyst using it would have determined that normal is a pretty good assumption on which to base a financial model. And for a risk analyst who worries more about the tails, it is wrong by not just a little, but a lot. With the Dow Jones data, the likelihood of even more extreme events—such as a 7 percent single-day price drop—will be underestimated by a factor of a billion or more. Note that this distribution is the basic assumption, however, for the Nobel Prize–winning theories of modern portfolio theory and options theory—two very widely used models. (Note: A technical person will point out that absolute price changes are actually the lognormal cousin of a normal distribution, but because I expressed the data in terms of price changes as a ratio relative to the previous day, I can apply the normal distribution.)

What shape do financial crashes really take? It turns out that financial disasters take on a distribution more similar to the distribution of volcanic eruptions, forest fires, earthquakes, power outages, asteroid impacts, and pandemic viruses. These phenomena take on a power-law distribution instead of a normal distribution. An event that follows a power law can be described as following a rule like this: “A once-in-a-decade event is x times as big as a once-in-a-year event,” where x is some ratio of relative severity. If we plot events such as these on a log/log scale (where each increment on the scale is ten times greater than the previous increment), they tend to look like straight lines. In exhibit 10.2 you can see the power-law distribution of fatalities due to hurricanes and earthquakes. In the case of earthquakes, I also show the magnitude measurement (which is already a log-scale—each increment indicates an earthquake ten times more powerful than the previous increment).

From the chart, you can see that power-law distributions are closely represented by a straight line on a log-log chart of frequency versus magnitude. A once-in-a-decade worst-case earthquake would kill about one hundred people. That's about ten times the damage of an earthquake that happens about every year. This means that x in the rule would be ten in this case. The same ratio applies as we move further down the line. An earthquake that would kill a thousand people (in the United States only) is a once-in-a-century event.

Graph depicting the power-law distribution of fatalities that occurred due to hurricanes and earthquakes. — **EXHIBIT 10.2** **Power-Law Distributions of Hurricane and Earthquake Frequency and Severity**

Many of the systems that seem to follow power-law distributions for failures are the kinds of stressed systems that allow for both common mode failures and cascade failures. Forest fires and power outages, for example, are systems of components in which a single event can affect many components and in which the failures of some components cause the failures of other components. Hot, dry days make everything more likely to burn and one tree being on fire makes its neighbors more likely to catch fire. Peak energy use periods strain an entire power grid and power overloads in one electrical subsystem cause strain on other subsystems.

Unfortunately, many of the systems that matter to business have a lot more in common with power grids and forest fires than with systems best modeled by normal distributions. Normal distributions apply best to problems in which, for example, we want to estimate a mean in a system with a large number of individual and independent components. If you want to estimate the proportion of heads you will get in a thousand flips of a coin, the normal distribution is your best bet for modeling your uncertainty about the outcome. But financial markets, supply chains, and major IT projects are complex systems of components in which each of the following occurs.

Characteristics of Systems with Power-Law Distributed Failures

The entire system can be stressed in a way that increases the chance of failure of all components, or the failure of one component causes the failure of several other components, possibly in parallel (i.e., a common mode failure).
The failure of those components in a system start a chain reaction of failures in series (cascade failure).
The failure of some components creates a feedback loop that exacerbates the failures of those components or other components (positive feedback).

Taleb points out that applying the normal distribution to markets seems to be a bad idea. Using this distribution, a one-day drop of 5 percent or greater in the stock market should have been unlikely to have occurred even once since 1928. I looked at the history of the Dow Jones index from 1928 to the end of 2008 and found that, instead, such a drop occurred seventy times—nine times just in 2008. We will also be discussing this more, later in the book. Models have to be tested empirically, regardless of how advanced they appear to be. Just because a quantitative model allegedly brings “rigor” to a particular problem (as the Nobel Prize committee stated about one award), that is no reason to believe that it actually works better than an alternative model.

Various fund managers have said that fluctuations in 1987 and 2008 were an extreme case of bad luck—so extreme that they were effectively far more unlikely than one chance in a trillion-trillion-trillion.³ If there is even a 1 percent chance they computed the odds incorrectly, bad math on their part is far more likely. In this case, the fact the event occurred even once is sufficient to cause serious doubt about the calculated probabilities.

Let's look at how close the history of the financial market is to the power laws. Exhibit 10.3 shows how the frequency and magnitude of daily price drops on the S&P 500 and Dow Jones Industrial Average (DJIA) appear on a log-log chart. The solid lines show actual price history for the two indices and the dashed lines show the Gaussian approximation of them. In the range of a drop of a few percentage points, the Gaussian and historical distributions are a moderately good match. For both indices, price drops of 1 percent are slightly overstated by the Gaussian distribution and between 2 percent and 3 percent price drops, the historical data matches Gaussian. The K-S test would look at this and determine that the normal distribution is close enough. But once the price drops get bigger than about 3 percent from the previous day's closing price, the two distributions diverge sharply.

A log-log chart depicting how the frequency and magnitude of daily price drops on the S&P 500 and Dow Jones Industrial Average (DJIA). — **EXHIBIT 10.3** **Frequency and Magnitude of Daily Price Drops in the S&P 500 and DJIA (Log-Log Chart)**

Clearly, the real data from either index looks a lot more like the downward-sloping straight lines we see on the log-log frequency and magnitude charts for hurricanes and earthquakes. The more strongly curving normal distribution applied to the same data indicates here that the normal distribution of these price drops would put a 6 percent or greater drop in a single day at something less frequent than a one-in-ten-thousand-year event.

In the actual data a price drop that large has already occurred many times and probably would occur at something closer to once every few years. After the 1987 crash, when both indices lost more than 20 percent in a single day, some analysts claimed the crash was something on the order of a once-in-a-million-year event. The power law distribution puts it closer to once in a century or so. Or, to put it another way, it has a reasonably good chance of occurring in a lifetime.

Although I'm a booster for the firms that developed powerful tools for Monte Carlo, some of the most popular products seem to have one major omission. Of all the wide assortment of distribution types they include in their models, most still do not include a power-law distribution. But they aren't hard to make. I included a simple random power-law generator in a spreadsheet on www.howtomeasureanything.com/riskmanagement.

Another interesting aspect of stressed-system, common-mode, cascade failures in positive feedback cycles is that if you model them as such, you may not even have to tell the model to produce a power-law distribution of failures. It could display this behavior simply by virtue of modeling those components of the system in detail. Computer models of forest fires, flu epidemics, and crowd behavior show this behavior naturally. The use of the explicit power law should still be required for any model that is a simple statistical description of outputs and not a model of underlying mechanisms. Therefore, financial models either will have to replace the normal distribution with power-law distributions or they will have to start making more detailed models of financial systems and the interactions about their components.

FOLLOWING YOUR INNER COW: THE PROBLEM WITH CORRELATIONS

Many systems we want to model are like herds of cattle—they tend to move together but in irregular ways. Cattle do not move together in any kind of formation like marching soldiers nor do they move entirely independently of each other like cats. Trying to describe the way in which one cow follows another with one or two numbers—such as “ten feet behind”—is sure to leave out a lot of complexity. Yet, this is exactly what is done in many quantitative risk models.

When two variables move up and down together in some way we say they are correlated. Correlation between two sets of data is generally expressed as a number between +1 and −1. A correlation of 1 means the two variables move in perfect harmony—as one increases so does the other. A correlation of −1 also indicates two perfectly related variables, but as one increases, the other decreases in lockstep. A correlation of 0 means they have nothing to do with each other.

The four examples of data in exhibit 10.4 show different degrees of correlation. The horizontal axis could be the Dow Jones and the vertical axis could be your revenues. Or the horizontal axis could be a number of mortgage defaults and the vertical axis could be unemployment. They could be anything we expect to be related in some way. But it is clear that the data in the two axes in some of the charts is more closely related than the data in other charts. The chart in the upper-left-hand corner is just two independent random variables. The variables have nothing to do with each other and there is no correlation. In the lower-right-hand corner, you can see two data points that are very closely related.

Grid charts depicting four different degrees of correlated data: No Correlation; 0.8 Correlation; –0.6 Correlation; and 0.99 Correlation. — **EXHIBIT 10.4** **Examples of Correlated Data**

Correlated random numbers are not difficult to generate given a coefficient of correlation. We can also use a simple formula in Excel (= correl()) to compute the correlation between two data sets. See the spreadsheet at www.howtomeasureanything.com/riskmanagement for simple examples that both generate correlated numbers and compute correlations among given data. Tools such as Crystal Ball and @Risk enable the modeler to specify correlations among any combination of variables.

Remember, correlations are just another level of detail to add to a quantitative model, and excluding them doesn't mean you are better off with a qualitative risk matrix model. Whether you can ever capture all the correlations is not a reason to stick with a risk matrix over even a simple quantitative model. After all, risk matrices have no way to explicitly include correlations even if you wanted to include them—with quantitative models you at least have that choice.

Having said that, modelers should be aware that excluding important correlations will almost always lead to a systematic underestimation of risks. If you are considering the risks of a construction project and you build a Monte Carlo with ranges for detailed costs for each part of a major facility, you might find that these costs may be correlated. If the costs of one part of the building rise, it is probably for reasons that would affect the costs of all parts of the multibuilding facility. The price of steel, concrete, and labor affects all of the buildings in a facility. Work stoppages due to strikes or weather tend to delay all of the construction, not just one part.

If the costs of different buildings in a facility were being modeled as independent variables, extremes tend to average out - like rolling a dozen dice. It would be unlikely for a dozen independent variables to all move up and down together by chance alone. But if they are correlated at all, then they do tend to move up and down together, and the risks of being over budget on one building are not necessarily offset by another building being under budget. They tend to all be over budget together.

Correlation significantly increases the risks, but even the savviest managers will ignore this. In a January 15, 2008, press release from Citigroup, CEO Vikrim Pandit explained the reported $9.83 billion loss for the fourth quarter of 2007: “Our financial results this quarter are clearly unacceptable. Our poor performance was driven primarily by two factors [emphasis added]—significant write-downs and losses on our sub-prime direct exposures in fixed income markets, and a large increase in credit costs in our U.S. consumer loan portfolio.”⁴

But these are not two independent factors. They are more like one factor. The housing market affects both of these. They would tend to move up and down together more often than not, and any risk model that treated them as independent significantly understated the risk. Another respected financial expert, Robert Rubin, Secretary of the Treasury under Bill Clinton, described the 2008 financial crisis as “a perfect storm” and said, “This is an extremely unlikely event with huge consequences.”⁵ Perfect storm seems to imply the random convergence of several independent factors—which probably was not the case.

The other big error in correlations is not the exclusion of relationships among variables but modeling them with a single correlation coefficient. Consider the two data sets shown in exhibit 10.5. Although the movement of the vertical axis data with the horizontal axis data is obviously different in the two charts, the typical calculation of a correlation would give the same coefficient for both. The one on the right could be approximated by a single “best-fit” correlation and the error around it, but the one on the left is both more complex and more precise. If we tried to model correlations that are historically related in the way the left-hand chart shows by using a single correlation coefficient, the Monte Carlo would generate something that looks like the chart on the right.

Charts depicting two different data sets for the same correlation coefficient with different movements of the vertical axis data and the horizontal axis data. — **EXHIBIT 10.5** **Same Correlation Coefficient, Very Different Patterns**

A correlation is a gross approximation of the relationship between two variables. Often, the relationship between two variables is best described by a more complex system than a single number. It's like the difference between knowing someone's IQ and knowing how the brain actually works.

Furthermore, simple correlations are not even close to being constant and, because the reasons for their correlations are not known, the correlations change without warning. John Joseph, a commodity trading advisor and principal with Sema4 Group in Dallas, Pennsylvania, has found that currency correlations change suddenly even after years of what seems like a consistent correlation. He points out that the correlation between the British pound and the Japanese yen relative to the dollar was positive from 1982 until 2007.

Then it swung in one year from a +0.65 correlation to −0.1. Most analysts modeling currency risk based on a year's worth of data would state with high confidence the correlation between the yen and the pound and that it would probably continue. But in reality, they have no basis for this confidence because this level of analysis explains nothing about the underlying system.

There is an alternative to using that single coefficient as a basis for correlation. When we model our uncertainties about the construction costs of a facility, we know that the price of steel, other materials, and labor affects all of the costs of all of the buildings. This can be modeled explicitly without resorting to correlation coefficients, and it will be a much more realistic model. It is like basing risk analysis of Mount St. Helens on the physics of systems of rock structures, pressure, and gravity instead of basing it on just the history of that volcano. The types of models that would also show power-law distributed failure modes by explicitly modeling things such as common mode failures do not need to resort to the very rough approximation of a coefficient of correlation.

THE MEASUREMENT INVERSION

Consider a decision analysis model for a new product. You have uncertainties about the cost and duration of development, materials costs once production starts, demand in different markets, and so on. This would be just like a typical cost-benefit analysis with a cash flow but instead of exact numbers, we use probability distributions to represent our uncertainty. We can even include the probability of a development project failure (no viable product was developed and the project was cancelled) or even more disastrous scenarios such as a major product recall. Any of these variables could be measured further with some cost and effort. So, which one would you measure first and how much would you be willing to spend? For years, I've been computing the value of additional information on every uncertain variable in a model.

Suppose we ran ten thousand scenarios in a simulation and determined that 1,500 of these scenarios resulted in a net loss. If we decide to go ahead with this product development, and we get one of these undesirable scenarios, the amount of money we would lose is the opportunity loss (OL)—the cost of making the wrong choice. If we didn't lose money, then the OL was zero. We can also have an OL if we decide not to approve the product but then find out we could have made money. In the case of rejecting the product, the OL is the difference between the lease and the money we made on the widgets if we would have made money—zero if the equipment did not make money (in which case we were right to reject the idea).

The expected opportunity loss (EOL) is each possible opportunity loss times the chance of that loss—in other words, the chance of being wrong times the cost of being wrong. In our Monte Carlo simulation, we simply average the OL for all of the scenarios. For now, let's say that given the current level of uncertainty about this product, you still think the lease is a good idea. So we average all 1500 scenarios the OL was positive (we lost money) and 8500 scenarios where OL was zero (me made the right choice). Suppose we find that the EOL is about $600,000.

The EOL is equivalent to another term called the expected value of perfect information (EVPI). The EVPI is the most you would reasonably be willing to pay if you could eliminate all uncertainty about this decision. Although it is almost impossible to ever get perfect information and eliminate all uncertainty, this value is useful as an absolute upper bound. If we can reduce the $600,000 EOL by half with a market survey that would cost $18,000, then the survey is probably a good deal. If you want to see a spreadsheet calculation of this type of problem, go to this book's website at www.howtomeasureanything.com/riskmanagement.

This becomes more enlightening when we compute the value of information for each variable in a model, especially when the models get very large. This way we not only get an idea for how much to spend on measurement but also which specific variables we need to measure and how much we might be willing to spend on them. I have done this calculation for more than 150 quantitative decision models in which most had about fifty to one hundred variables (for a total of about 10,000 variables, conservatively). From this, I've seen patterns that still persist every time I add more analysis to my library. The two main findings are:

Relatively few variables require further measurement—but there are almost always some.
The uncertain variables with the highest EVPI (highest value for further measurement) tend to be those that the organization almost never measures, and the variables they have been measuring have, on average, the lowest EVPI.

I call this second finding the measurement inversion, and I've seen it in IT portfolios, military logistics, environmental policy, venture capital, market forecasts, and every other place I've looked.

It seems that almost everybody, everywhere, is systematically measuring all the wrong things. It is so pervasive and impactful that I have to wonder how much this affects the gross domestic product. Organizations appear to measure what they know how to measure without wondering whether they should learn new measurement methods for very-high-value uncertainties.

How does tendency toward a measurement inversion affect risk assessment and, in turn, risk management? Highly uncertain and impactful risks may tend to get much less analysis than the easier-to-list, mundane events. The possibility of existential risks due to a major product recall, corporate scandal, major project failure, or factory disaster get less attention than the listing of much more routine and less impactful events. Conventional risk matrices are often populated with risks that are estimated to be so likely that they should happen several times a year. I've even seen risks estimated to be 80 percent, 90 percent, or even 100 percent probable in the next twelve months. At that level, that is more of a reliable cost of doing business. Of course, cost control is also important but it's not the same as risk management. If it is something you routinely budget for, it might not be the kind of risk upper management needs to see in a risk assessment.

Also, as an analyst myself as well as a manager of many analysts, I can tell you that analysts are not immune to wanting to use a modeling method because it uses the latest buzzwords. Perhaps an analyst just recently learned about random forests, Bayesian networks, or deep learning. If she finds it interesting and wants to use it, she can find a way to make it part of the solution. The measurement inversion shows that our intuition fails us regarding where we need to spend more time reducing uncertainty in probabilistic models. Unless we estimate the value of information, we may go down the deep rabbit hole of adding more and more detail to a model and trying to gather data on less relevant issues. Periodically, we just need to back up and ask if we are really capturing the main risks and if we are adding detail where it informs decisions most.

IS MONTE CARLO TOO COMPLICATED?

One issue with the adoption of Monte Carlo–based methods for addressing risks is the concern that Monte Carlos are too complex. Even some of those who use fairly quantitative methods in other ways apparently see the Monte Carlo as abstruse.

A book published in 1997, Value at Risk, expressed one reservation about this “weakness” of the Monte Carlo method.⁶ After acknowledging that “Monte Carlo is by far the most powerful method to compute value at risk,” it goes on to say:

The biggest drawback of this method is its computational cost. If one thousand sample paths are generated with a portfolio of one thousand assets, the total number of valuation calculations amounts to 1 million. When full valuation of assets is complex this method quickly becomes too onerous to implement on a frequent basis.

I've been running Monte Carlo simulations on a variety of risk analyses since the 1980s and even more regularly since the mid-1990s. Most of my models had fifty or more variables, and I routinely ran fifty thousand scenarios or more. That's a total of 2.5 million individual values generated, conservatively, each time I ran a Monte Carlo. But I don't recall, even on the computers of the mid-1990s, a simulation taking much more than sixty minutes. And I was running on uncompiled Excel macros—hardly the best available technology. My laptop now has a processor that is hundreds of times faster than my 1994 PC and has thousands of times as much RAM. Sam Savage further improves these calculations speeds by using a fast distribution calculation that would run calculations such as that in seconds.

Think of how much additional computing power would really cost if you wanted faster simulations. If Monte Carlos are “by far the most powerful method” (with which I agree), how much better off would you be if you were managing a large portfolio slightly better? Would it justify spending a little more on a high-end PC? Spending money on computing power for just one machine used by your risk analyst is trivial for virtually any risk analysis problem that would justify hiring a full-time person.

Still, my team and my clients get by just fine with good computers running Excel. The idea that Monte Carlo simulations today are onerous at all is some kind of retro-vacuum-tube thinking. Steve Hoye agrees: “In the old days, Monte Carlo was a big mainframe, but now with Excel-based tools, those assumptions are no longer applicable.”

Aside from the computing power issue, the idea that it is just too complex is also unfounded. Hoye goes on to point out, “Some people will argue that Monte Carlos are justified only for special cases. I think a lot of people have a misperception that it is difficult to understand and academic, therefore it's in their best interest not to have to deal with it.” Every time I've seen analysts object to the use of Monte Carlo simulations, they were never talking from experience. In each case, they knew very little about the method and had no basis for judging whether it was too complex.

Complexity, after all, is relative. I've always stated that my most complex Monte Carlo models were always far simpler than the systems I was modeling. I would model the risks of a software development project with more than a million lines of code. My model was one big spreadsheet with fewer than a hundred variables. And it would take less time and money than even the most trivial parts of the project I was analyzing.

For our purposes, the one-for-one substitution model in chapter 4 itself should be enough to address concerns about complexity. Even though it is applied to an extremely simple framework for risk assessment, it is a legitimate example of a Monte Carlo simulation. The math is all done for you. The only real challenge at this point is the nonquantitative issue of how you want to define your risks. In the last section of this book, we will provide some pointers on that, too.

NOTES

1. “Have We Underestimated Total Oil Reserves?” New Scientist 198, no. 2660 (June 11, 2008).
2. R. Carson, Mount St. Helens: The Eruption and Recovery of a Volcano (Seattle: Sasquatch Books, 2002), 69.
3. Some analysts indicated that the crash of 1987 was 13 standard deviations from the norm. In a normal distribution, such an event is about one chance in about 10³⁹. Others indicated it was a 16 or 20 standard deviation event—one chance in 10⁵⁸ and 10⁸⁹, respectively.
4. Vikram Pandit, Citigroup press release (January 15, 2008).
5. Interview with Fareed Zakaria, CNN (October 26, 2008).
6. P. Jorion, Value at Risk (Burr Ridge, IL: Irwin Professional Publications, 1997).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 10: Where Even the Quants Go Wrong: Common and Fundamental Errors in Quantitative Models

Create new playlist

Sign In

Sign Up