Throttling

The load on a service can vary over time (time of the day/year) as user behavior and the number of active users vary. Sometimes, there might be unexpected bursts or ramps in traffic. Every service is built and deployed with a specific load capacity in mind. If the processing requests exceed this capacity, the system will fail.

There are two options in terms of solving for a high load:

  • When the load is genuine, we increase capacity (the number of servers, service instances, network capacity, DB nodes, and so on) to meet the increased traffic.
  •  When the load is not genuine/business critical, analyze and control the requests, that is, throttle the request.

Some throttling strategies include the following:

  • Rejecting requests from an individual user whose crossed the assigned quota (say, making more than n requests/second to a specific API). This requires the system to meter the use of resources for each tenant per resource. A common way to implement throttling is to do it at the load-balancer level. For example, Nginx uses the Leaky Bucket algorithm for rate-limiting requests. Rate-limiting is configured with two main directives: limit_req_zone and limit_req. The first parameter defines what resource we are limiting and the throttle. The other directive is used in location blocks to actually implement the throttling. See https://www.nginx.com/blog/rate-limiting-nginx/ for more details.
  • The objective of a leaky bucket algorithm is to smooth out a variable/burst rate of input to produce a steady output rate so that the capacity of the target resource is not exceeded. At a high level, implementation can be thought of a FIFO queue where the incoming requests are stored. At a set clock tick, n requests are dequeued and sent to the service for processing—here, n is the target output rate we are aiming for. We can add more intelligence to this basic concept by factoring things such as effort estimate for each request, rather than blindly taking in a set number of requests at each clock tick. The algorithm is described in the following diagram:

  • Disabling or degrading a specific functionality so that instead of the whole service going down, a graceful degrade happens. For example, for a video streaming service, if the link is bad, a switch to a lower resolution can be made.
  • Using message-based queues to stagger the load and make computation asynchronous (delayed).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset