Shed Load

Services, microservices, websites, and open APIs all share one characteristic: they have zero control over their demand. At any moment, more than a billion devices could make a request. No matter how strong your load balancers or how fast you can scale, the world can always make more load than you can handle.

At the network level, TCP copes with a flood of connection attempts via the listen queue. Every incomplete connection goes into a queue per port. It’s up to the application to accept the connections. When the queue is full, new connection attempts are rejected with an ICMP RST (reset) packet.

TCP can’t save us entirely, though. Services often fall over before the connection queue fills up. When that happens, it’s almost always due to contention for a pooled resource. Threads start to slow down, waiting for a resource. Once they have the resource, they run slower because too much RAM and CPU are used by all the extra threads. Sometimes this gets exacerbated by other resource pools that are also exhausted. The net result is lengthening response times until callers start timing out. To an outside observer, there’s no difference between “really, really slow” and “down.”

Services should model TCP’s approach. When load gets too high, start to refuse new requests for work. This is related to Fail Fast.

The ideal way to define “load is too high” is for a service to monitor its own performance relative to its SLA. When requests take longer than the SLA, it’s time to shed some load. Failing that, you may choose to keep a semaphore in your application and only allow a certain number of concurrent requests in the system. A queue between accepting connections and processing them would have a similar effect, but at the expense of both complexity and latency.

When a load balancer is in the picture, individual instances can use a 503 status code on their health check pages to tell the load balancer to back off for a while.

Inside the boundaries of a system or enterprise, it’s more efficient to use back pressure (see Create Back Pressure) to create a balanced throughput of requests across synchronously coupled services. Shed load as a secondary measure in these cases.

Remember This

You can’t out-scale the world.

No matter how large your infrastructure or how fast you can scale it, the world has more people and devices than you can support. If your service is exposed to uncontrolled demand, then you need to be able to shed load when the world goes crazy on you.

Avoid slow responses using Shed Load.

Creating slow responses is being a bad citizen. Keep your response times under control rather than getting so slow that callers time out.

Use load balancers as shock absorbers.

Individual instances can report HTTP 503 to get some breathing room. Load balancers are good at recycling connections very quickly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset