Circuit breaker pattern

Systems, even the best designed systems, fail. The larger and more distributed a system, the higher the probability of failure. Many large systems such as Netflix or Google have extensive built-in redundancies. The redundancies don't decrease the chance of a failure of a component but they do provide a backup. Switching to the backup is frequently transparent to the end user.

The circuit breaker pattern is a common component of a system that provides this sort of redundancy. Let's say that your application queries an external data source every five seconds, perhaps you're polling for some data that you're expecting to change. What happens when this polling fails? In many cases the failure is simply ignored and the polling continues. This is actually a pretty good behaviour on the client side as data updates are not always crucial. In some cases, a failure will cause the application to retry the request immediately. Retrying server requests in a tight loop can be problematic for both the client and the server. The client may become unresponsive as it spends more time in a loop requesting data.

On the server side, a system that is attempting to recover from a failure is being slammed every five seconds by what could be thousands of clients. If the failure is due to the system being overloaded, then continuing to query it will only make matters worse.

The circuit breaker pattern stops attempting to communicate with a system that is failing once a certain number of failures have been reached. Basically, repeated failures result in the circuit being broken and the application ceasing to query. You can see the general pattern of a circuit breaker in this illustration:

Circuit breaker pattern

For the server, having the number of clients drop off as failures pile up allows for some breathing room to recover. The chances of a storm of requests coming in and keeping the system down is minimized.

Of course we would like the circuit breaker to reset at some point so that service can be restored. The two approaches for this are that, either the client polls periodically (less frequently than before) and resets the breaker, or that the external system communicates back to its clients that service has been restored.

Back-off

A variation on the circuit breaker pattern is to use some form of back-off instead of cutting out communication to the server completely. This is an approach that is suggested by many database vendors and cloud providers. If our original polling was at five second intervals, then when a failure is detected change the interval to every 10 seconds. Repeat this process using longer and longer intervals.

When requests start to work again then the pattern of changing the time interval is reversed. Requests are sent closer and closer together until the original polling interval is resumed.

Monitoring the status of the external resource availability is a perfect place to use background worker roles. The work is not complex but it is totally detached from the main event loop.

Again this reduces the load on the external resource giving it more breathing room. It also keeps the clients unburdened by too much polling.

An example using jQuery's ajax function looks like the following:

$.ajax({
  url : 'someurl',
  type : 'POST',
  data :  ....,
  tryCount : 0,
  retryLimit : 3,
  success : function(json) {
    //do something
  },
  error : function(xhr, textStatus, errorThrown ) {
    if (textStatus == 'timeout') {
      this.tryCount++;
      if (this.tryCount <= this.retryLimit) {
         //try again
         $.ajax(this);
         return;
      }
      return;
    }
    if (xhr.status == 500) {
      //handle error
    } else {
      //handle error
    }
  }
});

You can see that the highlighted section retries the query.

This style of back-off is actually used in Ethernet to avoid repeated packet collisions.

Degraded application behavior

There is likely a very good reason that your application is calling out to external resources. Backing off and not querying the data source is perfectly reasonable but it is still desirable that users have some ability to interact with the site. One solution to this problem is to degrade the behavior of the application.

For instance, if your application shows real-time stock quote information, but the system for delivering stock information is broken, then a less than real time service could be swapped in. Modern browsers have a whole raft of different technologies that allow for storing small quantities of data on the client computer. This storage space is ideal for caching old versions of some data should the latest versions be unavailable.

Even in cases where the application is sending data to the server, it is possible to degrade behaviour. Saving the data updates locally and then sending them altogether when the service is restored is generally acceptable. Of course, once a user leaves a page, then any background works will terminate. If the user never again returns to the site, then whatever updates they had queued to send to the server will be lost.

Note

A word of warning: if this is an approach you take it might be best to warn users that their data is old, especially if your application is a stock trading application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset