Part II. The Principles of Chaos

The performance of complex systems is typically optimized at the edge of chaos, just before system behavior will become unrecognizably turbulent.

Sidney Dekker, Drift Into Failure

The term “chaos” evokes a sense of randomness and disorder. However, that doesn’t mean Chaos Engineering is something that you do randomly or haphazardly. Nor does it mean that the job of a chaos engineer is to induce chaos. On the contrary: we view Chaos Engineering as a discipline. In particular, we view Chaos Engineering as an experimental discipline.

In the quote above, Dekker was making an observation about the overall behavior of distributed systems. He advocated for embracing a holistic view of how complex systems fail. Rather than looking for the “broken part,” we should try to understand how emergent behavior from component interactions could result in a system drifting into an unsafe, chaotic state.

You can think of Chaos Engineering as an empirical approach to addressing the question: “How close is our system to the edge of chaos?” Another way to think about this is: “How would our system fare if we injected chaos into it?”

In this chapter, we walk through the design of basic chaos experiments. We then delve deeper into advanced principles, which build on real-world applications of Chaos Engineering to systems at scale. Not all of the advanced principles are necessary in a chaos experiment, but we find that the more principles you can apply, the more confidence you’ll have in your system’s resiliency.

Experimentation

In college, electrical engineering majors are required to take a course called “Signals and Systems,” where they learn how to use mathematical models to reason about the behavior of electrical systems. One technique they learn is known as the Laplace transform. Using the Laplace transform, you can describe the entire behavior of an electrical circuit using a mathematical function called the transfer function. The transfer function describes how the system would respond if you subjected it to an impulse, an input signal that contains the sum of all possible input frequencies. Once you derive the transfer function of a circuit, you can predict how it will respond to any possible input signal.

There is no analog to the transfer function for a software system. Like all complex systems, software systems exhibit behavior for which we cannot build predictive models. It would be wonderful if we could use such models to reason about the impact of, say, a sudden increase in network latency, or a change in a dynamic configuration parameter. Unfortunately, no such models appear on the horizon.

Because we lack theoretical predictive models, we must use an empirical approach to understand how our system will behave under conditions. We come to understand how the system will react under different circumstances by running experiments on it. We push and poke on our system and observe what happens.

However, we don’t randomly subject our system to different inputs. We use a systematic approach in order to maximize the information we can obtain from each experiment. Just as scientists use experiments to study natural phenomena, we use experiments to reveal system behavior.

Advanced Principles

As you develop your Chaos Engineering experiments, keep the following principles in mind, as they will help guide your experimental design. In the following chapters, we delve deeper into each principle:

  • Hypothesize about steady state.

  • Vary real-world events.

  • Run experiments in production.

  • Automate experiments to run continuously.

  • Minimize blast radius.

1 Preetha Appan, Indeed.com, “I’m Putting Sloths on the Map”, presented at SRECon17 Americas, San Francisco, California, on March 13, 2017.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset