Controlling shard and replica allocation

As we already discussed, indices that live inside your cluster can be built of many shards and each shard can have many replicas. With the ability to have multiple shards of a single index, we can deal with indices that are too large to fit on a single machine. The reasons may be different—from memory to storage ones. With the ability to have multiple replicas of each shard, we can handle a higher query load by spreading replicas over multiple servers. In order to shard and replicate, ElasticSearch has to figure out where in the cluster it should place the shards and replicas. It needs to figure out which server/node each shard or replica should be placed on.

Explicitly controlling allocation

Imagine that we have our cluster divided into two sections. We want one index, named shop, to be placed on some nodes and the second index called users to be placed on other nodes, and the last index called promotions to be placed on all the nodes that the users and shop indices were placed on. We do that because the third index is much smaller than the other ones and thus we can afford having it along with other indices. However, with the default ElasticSearch behavior we can't be sure where the shards and replicas will be placed. But luckily, ElasticSearch allows us to control that.

Specifying nodes' parameters

So let's divide our cluster into two zones. We say zones, but it can be any name you want; we just like "the zone". So, our nodes numbered 1 and 2 will be placed in a zone called zone_one and the nodes numbered 3 and 4 will be placed in a zone called zone_two.

Configuration

In order to do that, we add the following property to the elasticsearch.yml configuration file on nodes 1 and 2:

node.zone: zone_one

We add the following property to the elasticsearch.yml configuration file on nodes 3 and 4:

node.zone: zone_two

Index creation

Now let's create our indices. First let's create the shop index. We do that by running the following commands:

curl -XPOST 'localhost:9200/shop'
curl -XPUT 'localhost:9200/shop/_settings' -d '{
 "index.routing.allocation.include.zone" : "zone_one"
}'

You should be familiar with the first command, which creates our index. The second command is sent to the _settings REST endpoint to specify additional settings for that index. We set the index.routing.allocation.include.zone property to the zone_one value, which means that we want to place the shop index on the nodes that have the node.zone property set to zone_one.

We perform similar steps for the users' index:

curl -XPOST 'localhost:9200/users'
curl -XPUT 'localhost:9200/users/_settings' -d '{
 "index.routing.allocation.include.zone" : "zone_two"
}'

However, this time we've specified that we want the users index to be placed on the nodes with the node.zone property set to zone_two.

Finally, the promotions index should be placed on all the above nodes; so we use the following command to create and configure that index:

curl -XPOST 'localhost:9200/pictures'
curl -XPOST 'localhost:9200/promotions'
curl -XPUT 'localhost:9200/promotions/_settings' -d '{
 "index.routing.allocation.include.zone" : "zone_one,zone_two"
}'

Excluding nodes from allocation

In the same manner with which we specified on which nodes the index should be placed, we can also exclude nodes from index allocation. Referring to the previously shown example, if we would like the index called pictures not to be placed on nodes with the node.zone property set to zone_one, we would run the following command:

curl -XPUT 'localhost:9200/pictures/_settings' -d '{
 "index.routing.allocation.exclude.zone" : "zone_one"
}'

Notice that instead of the index.routing.allocation.include.zone property, we've used the index.routing.allocation.exclude.zone property.

Using IP addresses for shard allocation

Instead of adding a special parameter to the node's configuration, we are allowed to use IP addresses to specify which nodes we want to include or exclude from shard and replica allocation. In order to do that, instead of using the zone part of the index.routing.allocation.include.zone or index.routing.allocation.exclude.zone property, we should use the _ip part. For example, if we would like our shop index to be placed only on nodes with IP addresses 10.1.2.10 and 10.1.2.11, we would run the following command:

curl -XPUT 'localhost:9200/shop/_settings' -d '{
 "index.routing.allocation.include._ip" : "10.1.2.10,10.1.2.11"
}'

Cluster-wide allocation

Instead of specifying allocation inclusion and exclusion on the index level (which we did till now), we can do that for all the indices in our cluster. For example, if we would like to place all the new indices on the nodes with IP addresses 10.1.2.10 and 10.1.2.11, we would run the following command:

curl -XPUT 'localhost:9200/_cluster/settings' -d '{
 "transient" : {
  "cluster.routing.allocation.include._ip" : "10.1.2.10,10.1.2.11"
 }
}'

Notice that the command was sent to the _cluster/settings REST endpoint instead of the INDEX_NAME/_settings endpoint. Note that ElasticSearch will just process the command and will not return any response.

Note

Please note that the transient and persistent cluster properties are going to be discussed in Controlling cluster rebalancing in Chapter 8, Dealing with Problems.

Number of shards and replicas per node

In addition to specifying shard and replica allocation, we are also allowed to specify the maximum number of shards that should be placed on a single node for a single index. For example, if we would like our shop index to have only a single shard per node, we would run the following command:

curl -XPUT 'localhost:9200/shop/_settings' -d '{
 "index.routing.allocation.total_shards_per_node" : 1
}'

This property can be placed in a elasticsearch.yml file or can be updated on live indices using the preceding command. Please remember that your cluster can stay in the red state if ElasticSearch is unable to allocate all the primary shards.

Manually moving shards and replicas

The last thing we wanted to discuss is the ability to manually move shards between nodes. ElasticSearch exposes the _cluster/reroute REST endpoint, which allows us to control that. The following operations are available:

  • Moving a shard from node to node
  • Canceling shard allocation
  • Forcing shard allocation

Now let's take a closer look at each of the previously mentioned operations.

Moving shards

Let's say we have two nodes called es_node_one and es_node_two and we have two shards of the shop index placed by ElasticSearch on the first node, and we would like to move the second shard to the second node. In order to do that we can run the following command:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
 "commands" : [ { 
  "move" : {
   "index" : "shop", 
   "shard" : 1, 
   "from_node" : "es_node_one", 
   "to_node" : "es_node_two" 
  }
 } ]
}'

We've specified the move command that allows us to move shards (and replicas) of the index specified by the index property. The shard property is the number of shards we want to move, and finally the from_node property specifies the name of the node we want to move the shard from, and the to_node property specifies the name of the node we want the shard to be placed on.

Canceling allocation

If we would like to cancel an on-going allocation process, we can run the cancel command and specify the index, node, and shard for which we want to cancel the allocation. For example:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
 "commands" : [ {
  "cancel" : {
   "index" : "shop", 
   "shard" : 0, 
   "node" : "es_node_one"
  }
 } ]
}'

The preceding command would cancel the allocation of shard 0 of the shop index on the es_node_one node.

Allocating shards

In addition to canceling and moving shards and replicas, we are also allowed to allocate an unallocated shard to a specific node. For example, if we have an unallocated shard numbered 0 for the users index and we would like it to be allocated to es_node_two by ElasticSearch, we would run the following command:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
 "commands" : [ {
  "allocate" : {
   "index" : "users", 
   "shard" : 0, 
   "node" : "es_node_two"
  }
 } ]
}'

Multiple commands per HTTP request

We can, of course, include multiple commands in a single HTTP request. For example:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
 "commands" : [
  {"move" : {"index" : "shop", "shard" : 1, "from_node" : "es_node_one", "to_node" : "es_node_two"}},
  {"cancel" : {"index" : "shop", "shard" : 0, "node" : "es_node_one"}}
 ]
}'
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset