Engineering reliability

As a quick recap from Chapter 5, Going Distributed, we saw that microservices interact with one another over the network using either APIs or Messaging. The basic idea is that, using a specific protocol, microservices will exchange data in a standardized format over the network to enable macro-behavior and fulfill the requirement. There are multiple places where things can go wrong here, as shown in the following diagram:

Preceding diagram is described as follows:

A service may go down either during the service of a request from the client, or when it's idle. The service may go down because the machine went down (hardware/hypervisor errors) or because there was an uncaught exception in the code.
A database hosting persistent data may go down. The durable storage might get corrupted. The DB can crash in the middle of a transaction!
A service may spawn an in-memory job, respond with OK to the client, and then go down, removing any reference to the job.
A service may consume a message from the broker but may crash just before acting on it.
The network link between two services may go down or be slow.
A dependent external service may start acting slow or start throwing errors.

Reliability in a system is engineered at multiple levels:

Individual services are built as per the specification and work correctly
Services are deployed in a high-availability setup so that a backup/alternate instance can take the place of an unhealthy one
The architecture allows the composite of individual services to be fault-tolerant and rugged

We will look at dependency management in couple of the Dependencies and Dependency resilience section. For the rest, we will cover engineering reliability in the following subsections.

Table of Contents for Engineering reliability

Create new playlist

Sign In

Sign Up

Table of Contents for
Engineering reliability