Controlling cluster rebalancing

By default, Elasticsearch tries to keep the shards and their replicas evenly balanced across the cluster. Such behavior is good in most cases, but there are times when we want to control this behavior—for example, during rolling restarts. We don't want to rebalance the entire cluster when one or two nodes are restarted. In this section, we will look at how to avoid cluster rebalance and control this process' behavior in depth.

Imagine a situation where you know that your network can handle very high amounts of traffic or the opposite of this— your network is used extensively and you want to avoid too much load on it. The other example is that you may want to decrease the pressure that is put on your I/O subsystem after a full-cluster restart and you want to have less shards and replicas being initialized at the same time. These are only two examples where rebalance control may be handy.

Understanding rebalance

Rebalancing is the process of moving shards between different nodes in our cluster. As we have already mentioned, it is fine in most situations, but sometimes you may want to completely avoid this. For example, if we define how our shards are placed and we want to keep it this way, we may want to avoid rebalancing. However, by default, Elasticsearch will try to rebalance the cluster whenever the cluster state changes and Elasticsearch thinks a rebalance is needed (and the delayed timeout has passed as discussed in The gateway and recovery modules section of Chapter 9, Elasticsearch Cluster in Detail).

Cluster being ready

We already know that our indices are built from shards and replicas. Primary shards or just shards are the ones that get the data first. The replicas are physical copies of the primaries and get the data from them. You can think of the cluster as being ready to be used when all the primary shards are assigned to their nodes in your cluster – as soon as the yellow health state is achieved. However, Elasticsearch may still initialize other shards – the replicas. However, you can use your cluster and be sure that you can search your entire data set and send index change commands. Then the commands will be processed properly.

The cluster rebalance settings

Elasticsearch lets us control the rebalance process with the use of a few properties that can be set in the elasticsearch.yml file or by using the Elasticsearch REST API (as described in The update settings API section of Chapter 9, Elasticsearch Cluster in Detail).

Controlling when rebalancing will be allowed

The cluster.routing.allocation.allow_rebalance property allows us to specify when rebalancing is allowed. This property can take the following values:

  • always: Rebalancing will be allowed as soon as it's needed
  • indices_primaries_active: Rebalancing will be allowed when all the primary shards are initialized
  • indices_all_active: The default one, which means that rebalancing will be allowed when all the shards and replicas are initialized

The cluster.routing.allocation.allow_rebalance property can be set in the elasticsearch.yml configuration file and updated dynamically as well.

Controlling the number of shards being moved between nodes concurrently

The cluster.routing.allocation.cluster_concurrent_rebalance property allows us to specify how many shards can be moved between nodes at once in the entire cluster. If you have a cluster that is built from many nodes, you can increase this value. This value defaults to 2. You can increase the default value if you would like the rebalancing to be performed faster, but this will put more pressure on your cluster resources and will affect indexing and querying. The cluster.routing.allocation.cluster_concurrent_rebalance property can be set in the elasticsearch.yml configuration file and updated dynamically as well.

Controlling which shards may be rebalanced

The cluster.routing.allocation.enable property allows us to specify when which shards will be allowed to be rebalanced by Elasticsearch. This property can take the following values:

  • all: The default behavior, which tells Elasticsearch to rebalance all the shards in the cluster
  • primaries: This value allows the rebalancing of the primary shards only
  • replicas: This value allows the rebalancing of the replica shards only
  • none: This value disables the rebalancing of all type of shards for all indices in the cluster

The cluster.routing.allocation.enable property can be set in the elasticsearch.yml configuration file and updated dynamically as well.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset