Controlling the shard and replica allocation

The indices that live inside your Elasticsearch cluster can be built from many shards and each shard can have many replicas. The ability to divide a single index into multiple shards gives us the possibility of dividing the data into multiple physical instances. The reasons why we want to do this may be different. We may want to parallelize indexing to get more throughput, or we may want to have smaller shards so that our queries are faster. Of course, we may have too many documents to fit them on a single machine and we may want a shard because of this. With replicas, we can parallelize the query load by having multiple physical copies of each shard. We can say that, using shards and replicas, we can scale out Elasticsearch. However, Elasticsearch has to figure out where in the cluster it should place shards and replicas. It needs to figure out on which server/nodes each shard or replica should be placed.

Explicitly controlling allocation

One of the most common use cases that use explicit controlling of shards and replicas allocation in Elasticsearch is time-based data, that is, logs. Each log event has a timestamp associated with it; however, the amount of logs in most organizations is just enormous. The thing is that you need a lot of processing power to index them, but you don't usually search historical data. Of course, you may want to do that, but it will be done less frequently than the queries for the most recent data.

Because of this, we can divide the cluster into so called two tiers—the cold and the hot tier. The hot tier contains more powerful nodes, ones that have very fast disks, lots of CPU processing power, and memory. These nodes will handle both a lot of indexing as well as queries for recent data. The cold tier, on the other hand, will contain nodes that have very large disks, but are not very fast. We won't be indexing into the cold tier; we will only store our historical indices here and search them from time to time. With the default Elasticsearch behavior, we can't be sure where the shards and replicas will be placed, but luckily Elasticsearch allows us to control this.

Note

The main assumption when it comes to time series data is that once they are indexed, they are not being updated. This is true for log indexing use cases and we assume we create Elasticsearch deployment for such a use case.

The idea is to create the indices that index today's data on the hot nodes and, when we stop using it (when another day starts), we update the index settings so that it is moved to the tier called cold. Let's now see how we can do this.

Specifying node parameters

So let's divide our cluster into two tiers. We say tiers, but they can be any name you want, we just like the term "tier" and it is commonly used. We assume that we have six nodes. We want our more powerful nodes numbered 1 and 2 to be placed in the tier called hot and the nodes numbered 3, 4, 5, and 6, which are smaller in terms of CPU and memory, but very large in terms of disk space, to be placed in a tier called cold.

Configuration

To configure, we add the following property to the elasticsearch.yml configuration file on nodes 1 and 2 (the ones that are more powerful):

node.tier: hot

Of course, we will add a similar property to the elasticsearch.yml configuration file on nodes 3, 4, 5, and 6 (the less powerful ones):

node.tier: cold

Index creation

Now let's create our daily index for today's data, one called logs_2015-12-10. As we said earlier, we want this to be placed on the nodes in the hot tier. We do this by running the following commands:

curl -XPUT 'http://localhost:9200/logs_2015-12-10' -d '{
 "settings" : {
  "index" : {
   "routing.allocation.include.tier" : "hot"
  }
 }
}'

The preceding command will result in the creation of the logs_2015-12-10 index and specification of the index.routing.allocation.include.tier property to it. We set this property to the hot value, which means that we want to place the logs_2015-12-10 index on the nodes that have the node.tier property set to hot.

Now, when the day ends and we need to create a new index, we again put it on the hot nodes. We do this by running the following command:

curl -XPUT 'http://localhost:9200/logs_2015-12-11' -d '{
 "settings" : {
  "index" : {
   "routing.allocation.include.tier" : "hot"
  }
 }
}'

Finally, we need to tell Elasticsearch to move the index holding the data for the previous day to the cold tier. We do this by updating the index settings and setting the index.routing.allocation.include.tier property to cold. This is done using the following command:

curl -XPUT 'http://localhost:9200/logs_2015-12-10/_settings' -d '{
 "index.routing.allocation.include.tier" : "cold"
}'

After running the preceding command, Elasticsearch will start relocating the index called logs_2015-12-10 to the nodes that have the node.tier property set to cold in the elasticsearch.yml file without any manual work needed from us.

Excluding nodes from allocation

In the same manner as we specified on which nodes the index should be placed, we can also exclude nodes from index allocation. Referring to the previously shown example. if we want the index called logs_2015-12-10 to not be placed on the nodes with the node.tier property set to cold, we would run the following command:

curl -XPUT 'localhost:9200/logs_2015-12-10/_settings' -d '{
 "index.routing.allocation.exclude.tier" : "cold"
}'

Notice that instead of the index.routing.allocation.include.tier property, we've used the index.routing.allocation.exclude.tier property.

Requiring node attributes

In addition to inclusion and exclusion rules, we can also specify the rules that must match in order for a shard to be allocated to a given node. The difference is that when using the index.routing.allocation.include property, the index will be placed on any node that matches at least one of the provided property values. Using index.routing.allocation.require, Elasticsearch will place the index on a node that has all the defined values. For example, let's assume that we've set the following settings for the logs_2015-12-10 index:

curl -XPUT 'localhost:9200/logs_2015-12-10/_settings' -d '{
 "index.routing.allocation.require.tier" : "hot",
 "index.routing.allocation.require.disk_type" : "ssd"
}'

After running the preceding command, Elasticsearch would only place the shards of the logs_2015-12-10 index on a node with the node.tier property set to hot and the node.disk_type property set to ssd.

Using the IP address for shard allocation

Instead of adding a special parameter to the nodes configuration, we are allowed to use IP addresses to specify which nodes we want to include or exclude from the shards and replicas allocation. In order to do this, instead of using the tier part of the index.routing.allocation.include.tier or index.routing.allocation.exclude.tier properties, we should use the _ip. For example, if we would like our logs_2015-12-10 index to be placed only on the nodes with the 10.1.2.10 and 10.1.2.11 IP addresses, we would run the following command:

curl -XPUT 'localhost:9200/logs_2015-12-10/_settings' -d '{
 "index.routing.allocation.include._ip" : "10.1.2.10,10.1.2.11"
}'

Note

In addition to _ip, Elasticsearch also allows us to use _name to specify allocation rules using node names and _host to specify allocation rules using host names.

Disk-based shard allocation

In addition to the already described allocation filtering methods, Elasticsearch gives us disk-based shard allocation rules. It allows us to set allocation rules based on the nodes' disk usage.

Configuring disk based shard allocation

There are four properties that control the behavior of a disk-based shard allocation. All of them can be updated dynamically or set in the elasticsearch.yml configuration file.

The first of these is cluster.info.update.interval, which is by default set to 30 seconds and defines how often Elasticsearch updates information about disk usage on nodes.

The second property is the cluster.routing.allocation.disk.watermark.low, which is by default set to 0.85. This means that Elasticsearch will not allocate new shards to a node that uses more than 85% of its disk space.

The third property is the cluster.routing.allocation.disk.watermark.high, which controls when Elasticsearch will start relocating shards from a given node. It defaults to 0.90 and means that Elasticsearch will start reallocating shards when the disk usage on a given node is equal to or more than 90%.

Both the cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high properties can be set to a percentage value (such as 0.60, meaning 60%) and to an absolute value (such as 600mb, meaning 600 megabytes).

Finally, the last property is cluster.routing.allocation.disk.include_relocations, which by default is set to true. It tells Elasticsearch to take into account the shards that are not yet copied to the node but Elasticsearch is in the process of doing that. Having this behavior turned on by default means that the disk-based allocation mechanism will be more pessimistic when it comes to available disk spaces (when shards are relocating), but we won't run into situations where shards can't be relocated because the assumptions about disk space were wrong.

Disabling disk based shard allocation

The disk based shard allocation is enabled by default. We can disable it by specifying the cluster.routing.allocation.disk.threshold_enabled property and setting it to false. We can do this in the elasticsearch.yml file or dynamically using the cluster settings API:

curl -XPUT localhost:9200/_cluster/settings -d '{
 "transient" : {
  "cluster.routing.allocation.disk.threshold_enabled" : false
 }
}'

The number of shards and replicas per node

In addition to specifying shards and replicas allocation, we are also allowed to specify the maximum number of shards that can be placed on a single node for a single index. For example, if we would like our logs_2015-12-10 index to have only a single shard per node, we would run the following command:

curl -XPUT 'localhost:9200/logs_2015-12-10/_settings' -d '{
 "index.routing.allocation.total_shards_per_node" : 1
}'

This property can be placed in the elasticsearch.yml file or can be updated on live indices using the preceding command. Please remember that your cluster can stay in the red state if Elasticsearch won't be able to allocate all the primary shards.

Allocation throttling

The Elasticsearch allocation mechanism can be throttled, which means that we can control how much resources Elasticsearch will use during the shard allocation and recovery process. We are given five properties to control, which are as follows:

  • cluster.routing.allocation.node_concurrent_recoveries: This property defines how many concurrent shard recoveries may be happening at the same time on a node. This defaults to 2 and should be increased if you would like more shards to be recovered at the same time on a single node. However, increasing this value will result in more resource consumption during recovery. Also, please remember that during the replica recovery process, data will be copied from the other nodes over the network, which can be slow.
  • cluster.routing.allocation.node_initial_primaries_recoveries: This property defaults to 4 and defines how many primary shards are recovered at the same time on a given node. Because primary shard recovery uses data from local disks, this process should be very fast.
  • cluster.routing.allocation.same_shard.host: A Boolean property that defaults to false and is applicable only when multiple Elasticsearch nodes are started on the same machine. When set to true, this will force Elasticsearch to check whether physical copies of the same shard are present on a single physical machine. The default false value means no check is done.
  • indices.recovery.concurrent_streams: This is the number of network streams used to copy data from other nodes that can be used concurrently on a single node. The more the streams, the faster the data will be copied, but this will result in more resource consumption. This property defaults to 3.
  • indices.recovery.concurrent_small_file_streams: This is similar to the indices.recovery.concurrent_streams property, but defines how many concurrent data streams Elasticsearch will use to copy small files (ones that are under 5mb in size). This property defaults to 2.

This allows us to perform a check to prevent the allocation of multiple instances of the same shard on a single host, based on host name and host address. This defaults to false, meaning that no check is performed by default. This setting only applies if multiple nodes are started on the same machine.

Cluster-wide allocation

In addition to the per indices allocation settings, Elasticsearch also allows us to control shard and indices allocation on a cluster-wide basis—so called shard allocation awareness. This is especially useful when we have nodes in different physical racks and we would like to place shards and replicas in different physical nodes.

Let's start with a simple example. We assume that we have a cluster built of four nodes. Each node in a different physical rack. The simple graphic that illustrates this is as follows:

Cluster-wide allocation

As you can see, our cluster is built from four nodes. Each node was bound to a specific IP address and each node was given the tag property and a group property (added to elasticsearch.yml as the node.tag and node.group properties). This cluster will serve the purpose of showing how shard allocation filtering works. The group and tag properties can be given whatever names you want, you just need to prefix your desired property name with the node name, for example, if you would like to use a party property name, you would just add node.party: party1 to your elasticsearch.yml.

Allocation awareness

Allocation awareness allows us to configure shards and their replicas allocation with the use of generic parameters. In order to illustrate how allocation awareness works, we will use our example cluster. For the example to work, we should add the following property to the elasticsearch.yml file:

cluster.routing.allocation.awareness.attributes: group

This will tell Elasticsearch to use the node.group property as the awareness parameter.

Note

You can specify multiple attributes when setting the cluster.routing.allocation.awareness.attributes property. For example:

cluster.routing.allocation.awareness.attributes: group, node

After this, let's start the first two nodes, the ones with the node.group parameter equal to groupA, and let's create an index by running the following command:

curl -XPOST 'localhost:9200/awarness' -d '{
 "settings" : {
  "index" : {
   "number_of_shards" : 1,

   "number_of_replicas" : 1
  }
 }
}'

After this command, our two-node cluster will look more or less like this:

Allocation awareness

As you can see, the index was divided between the two nodes evenly. Now let's see what happens when we launch the rest of the nodes (the ones with node.group set to groupB):

Allocation awareness

Notice the difference—the primary shards were not moved from their original allocation nodes, but the replica shards were moved to the nodes with a different node.group value. That's exactly right; when using shard allocation awareness, Elasticsearch won't allocate the primary shards and replicas of the same index to the nodes with the same value of the property used to determine the allocation awareness (which in our case is the node.group).

Note

Please remember that when using allocation awareness, shards will not be allocated to the node that doesn't have the expected attributes set. So in our example, a node without the node.group property set will not be taken into consideration by the allocation mechanism.

Forcing allocation awareness

Forcing allocation awareness can come in handy when we know, in advance, how many values our awareness attributes can take and we don't want more replicas than needed to be allocated in our cluster, for example, not to overload our cluster with too many replicas. For this, we can force the allocation awareness to be active only for certain attributes. We can specify these values using the cluster.routing.allocation.awareness.force.zone.values property and providing a list of comma-separated values to it. For example, if we would like the allocation awareness to use only the groupA and groupB values of the node.group property, we would add the following to the elasticsearch.yml file:

cluster.routing.allocation.awareness.attributes: group
cluster.routing.allocation.awareness.force.zone.values: groupA, groupB

Filtering

Elasticsearch allows us to configure allocation for the entire cluster or for the index level. In the case of cluster allocation, we can use the properties prefixes:

  • cluster.routing.allocation.include
  • cluster.routing.allocation.require
  • cluster.routing.allocation.exclude

When it comes to index-specific allocation, we can use the following properties prefixes:

  • index.routing.allocation.include
  • index.routing.allocation.require
  • index.routing.allocation.exclude

The previously mentioned prefixes can be used with the properties that we've defined in the elasticsearch.yml file (our tag and group properties) and with a special property called _ip that allows us to match or exclude the use of the nodes' IP addresses, for example, like this:

cluster.routing.allocation.include._ip: 192.168.2.1

If we would like to include nodes with a group property matching the groupA value, we would set the following property:

cluster.routing.allocation.include.group: groupA

Notice that we've used the cluster.routing.allocation.include prefix and we've concatenated it with the name of the property, which is group in our case.

What do include, exclude, and require mean

If you look closely at the preceding parameters, you will notice that there are three kinds:

  • include: This type will result in including all the nodes with this parameter defined. If multiple include conditions are visible than all the nodes that match at least a one of these conditions will be taken into consideration when allocating shards. For example, if we add two cluster.routing.allocation.include.tag parameters to our configuration, one with a property with the value of node1 and second with the node2 value, we would end up with indices (actually their shards) being allocated to the first and second node (counting from left to right). To sum up the nodes that have the include allocation parameter type will be taken into consideration by Elasticsearch when choosing the nodes to place shards on, but this doesn't mean that Elasticsearch will put shards in them.
  • require: This parameter, which was introduced in the Elasticsearch 0.90 type of allocation filter, requires all the nodes to have a value that matches the value of this property. For example, if we add one cluster.routing.allocation.require.tag parameter to our configuration with the value of node1 and a cluster.routing.allocation.require.group parameter with the value of groupA, we would end up with shards allocated only to the first node (the one with an IP address of 192.168.2.1).
  • exclude: This parameter allows us to exclude nodes with given properties from the allocation process. For example, if we set cluster.routing.allocation.include.tag to groupA, we would end up with indices being allocated only to the nodes with IP addresses 192.168.3.1 and 192.168.3.2 (the third and fourth nodes in our example).

    Note

    The property value can use simple wildcard characters. For example, if we want to include all the nodes that have the group parameter value beginning with group, we could set the cluster.routing.allocation.include.group property to group*. In the example cluster case, this would result in matching nodes with the groupA and groupB group parameter values.

Manually moving shards and replicas

The last thing we wanted to discuss is the ability to manually move shards between nodes. Elasticsearch exposes the _cluster/reroute REST end-point, which allows us to control that. The following operations are available:

  • Moving a shard from node to node
  • Cancelling shard allocation
  • Forcing shard allocation

Now let's look closely at all of the preceding operations.

Moving shards

Let's say we have two nodes called es_node_one and es_node_two, and we have two shards of the shop index placed by Elasticsearch on the first node and we would like to move the second shard to the second node. In order to do this, we can run the following command:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
 "commands" : [ { 
  "move" : {
   "index" : "shop", 
   "shard" : 1, 
   "from_node" : "es_node_one", 
   "to_node" : "es_node_two" 
  }
 } ]
}'

We've specified the move command, which allows us to move shards (and replicas) of the index specified by the index property. The shard property is the number of shards we want to move. And, finally, the from_node property specifies the name of the node we want to move the shard from and the to_node property specifies the name of the node we want the shard to be placed on.

Canceling shard allocation

If we would like to cancel an on-going allocation process, we can run the cancel command and specify the index, node, and shard we want to cancel the allocation for. For example:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
 "commands" : [ {
  "cancel" : {
   "index" : "shop", 
   "shard" : 0, 
   "node" : "es_node_one"
  }
 } ]
}'

The preceding command would cancel the allocation of shard 0 of the shop index on the es_node_one node.

Forcing shard allocation

In addition to cancelling and moving shards and replicas, we are also allowed to allocate an unallocated shard to a specific node. For example, if we have an unallocated shard numbered 0 for the users index and we would like it to be allocated to es_node_two by Elasticsearch, we would run the following command:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
 "commands" : [ {
  "allocate" : {
   "index" : "users", 
   "shard" : 0, 
   "node" : "es_node_two"
  }
 } ]
}'

Multiple commands per HTTP request

We can, of course, include multiple commands in a single HTTP request. For example:

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
 "commands" : [
  {"move" : {"index" : "shop", "shard" : 1, "from_node" : "es_node_one", "to_node" : "es_node_two"}},
  {"cancel" : {"index" : "shop", "shard" : 0, "node" : "es_node_one"}}
 ]
}'

Allowing operations on primary shards

The cancel and allocate commands accept an additional allow_primary parameter. If set to true, it tells Elasticsearch that the operation can be performed on the primary shard. Please be advised that operations with the allow_primary parameter set to true may result in data loss.

Handling rolling restarts

There is one more thing that we would like to discuss when it comes to shard and replica allocation—handling rolling restarts. When Elasticsearch is restarted, it may take some time to get it back to the cluster. During this time, the rest of the cluster may decide to do rebalancing and move shards around. When we know we are doing rolling restarts, for example, to update Elasticsearch to a new version or install a plugin, we may want to tell this to Elasticsearch. The procedure for restarting each node should be as follows:

First, before you do any maintenance, you should stop the allocation by sending the following command:

curl -XPUT 'localhost:9200/_cluster/settings' -d '{
 "transient" : {
  "cluster.routing.allocation.enable" : "none"
 }
}'

This will tell Elasticsearch to stop allocation. After this, we will stop the node we want to do maintenance on and start it again. After it joins the cluster, we can enable the allocation again by running the following:

curl -XPUT 'localhost:9200/_cluster/settings' -d '{
 "transient" : {
  "cluster.routing.allocation.enable" : "all"
 }
}'

This will enable the allocation again. This procedure should be repeated for each node we want to perform maintenance on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset