In Elasticsearch Server Second Edition, published by Packt Publishing, we talked about a number of things related to the shard allocation functionality provided by Elasticsearch. We discussed the Cluster Reroute API, shard rebalancing, and shard awareness. Although now very commonly used, these topics are very important if you want to be in full control of your Elasticsearch cluster. Because of that, we decided to extend the examples provided in Elasticsearch Server Second Edition and provide you with guidance on how to use Elasticsearch shards awareness and alter the default shard allocation mechanism.
Let's start with a simple example. We assume that we have a cluster built of four nodes that looks as follows:
As you can see, our cluster is built of four nodes. Each node was bound to a specific IP address, and each node was given the tag
property and a group
property (added to elasticsearch.yml
as node.tag
and node.group
properties). This cluster will serve the purpose of showing you how shard allocation filtering works. The group
and tag
properties can be given whatever names you want; you just need to prefix your desired property name with the node
name; for example, if you would like to use a party
property name, you would just add node.party: party1
to your elasticsearch.yml
file.
Allocation awareness allows us to configure shards and their replicas' allocation with the use of generic parameters. In order to illustrate how allocation awareness works, we will use our example cluster. For the example to work, we should add the following property to the elasticsearch.yml
file:
cluster.routing.allocation.awareness.attributes: group
This will tell Elasticsearch to use the node.group
property as the awareness parameter.
After this, let's start the first two nodes, the ones with the node.group
parameter equal to groupA
, and let's create an index by running the following command:
curl -XPOST 'localhost:9200/mastering' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } } }'
After this command, our two nodes' cluster will look more or less like this:
As you can see, the index was divided evenly between two nodes. Now let's see what happens when we launch the rest of the nodes (the ones with node.group
set to groupB
):
Notice the difference: the primary shards were not moved from their original allocation nodes, but the replica shards were moved to the nodes with a different node.group
value. That's exactly right—when using shard allocation awareness, Elasticsearch won't allocate shards and replicas to the nodes with the same value of the property used to determine the allocation awareness (which, in our case, is node.group
). One of the example usages of this functionality is to divide the cluster topology between virtual machines or physical locations in order to be sure that you don't have a single point of failure.
Forcing allocation awareness can come in handy when we know, in advance, how many values our awareness attributes can take, and we don't want more replicas than needed to be allocated in our cluster, for example, not to overload our cluster with too many replicas. To do this, we can force allocation awareness to be active only for certain attributes. We can specify these values using the cluster.routing.allocation.awareness.force.zone.values
property and providing a list of comma-separated values to it. For example, if we would like allocation awareness to only use the groupA
and groupB
values of the node.group
property, we would add the following to the elasticsearch.yml
file:
cluster.routing.allocation.awareness.attributes: group cluster.routing.allocation.awareness.force.zone.values: groupA, groupB
Elasticsearch allows us to configure the allocation for the whole cluster or for the index level. In the case of cluster allocation, we can use the properties prefixes:
cluster.routing.allocation.include
cluster.routing.allocation.require
cluster.routing.allocation.exclude
When it comes to index-specific allocation, we can use the following properties prefixes:
index.routing.allocation.include
index.routing.allocation.require
index.routing.allocation.exclude
The previously mentioned prefixes can be used with the properties that we've defined in the elasticsearch.yml
file (our tag
and group
properties) and with a special property called _ip
that allows us to match or exclude IPs using nodes' IP address, for example, like this:
cluster.routing.allocation.include._ip: 192.168.2.1
If we would like to include nodes with a group
property matching the groupA
value, we would set the following property:
cluster.routing.allocation.include.group: groupA
Notice that we've used the cluster.routing.allocation.include
prefix, and we've concatenated it with the name of the property, which is group
in our case.
If you look closely at the parameters mentioned previously, you would notice that there are three kinds:
include
: This type will result in the inclusion of all the nodes with this parameter defined. If multiple include
conditions are visible, then all the nodes that match at least one of these conditions will be taken into consideration when allocating shards. For example, if we would add two cluster.routing.allocation.include.tag
parameters to our configuration, one with a property to the value of node1
and the second with the node2
value, we would end up with indices (actually, their shards) being allocated to the first and second node (counting from left to right). To sum up, the nodes that have the include
allocation parameter type will be taken into consideration by Elasticsearch when choosing the nodes to place shards on, but that doesn't mean that Elasticsearch will put shards on them.require
: This was introduced in the Elasticsearch 0.90 type of allocation filter, and it requires all the nodes to have the value that matches the value of this property. For example, if we would add one cluster.routing.allocation.require.tag
parameter to our configuration with the value of node1
and a cluster.routing.allocation.require.group
parameter, the value of groupA
would end up with shards allocated only to the first node (the one with the IP address of 192.168.2.1
).exclude
: This allows us to exclude nodes with given properties from the allocation process. For example, if we set cluster.routing.allocation.include.tag
to groupA
, we would end up with indices being allocated only to nodes with IP addresses 192.168.3.1
and 192.168.3.2
(the third and fourth node in our example).Property values can use simple wildcard characters. For example, if we would like to include all the nodes that have the group
parameter value beginning with group
, we could set the cluster.routing.allocation.include.group
property to group*
. In the example cluster case, it would result in matching nodes with the groupA
and groupB
group
parameter values.
In addition to setting all discussed properties in the elasticsearch.yml
file, we can also use the update API to update these settings in real-time when the cluster is already running.
In order to update settings for a given index (for example, our mastering
index), we could run the following command:
curl -XPUT 'localhost:9200/mastering/_settings' -d '{ "index.routing.allocation.require.group": "groupA" }'
As you can see, the command was sent to the _settings
end-point for a given index. You can include multiple properties in a single call.
In order to update settings for the whole cluster, we could run the following command:
curl -XPUT 'localhost:9200/_cluster/settings' -d '{ "transient" : { "cluster.routing.allocation.require.group": "groupA" } }'
As you can see, the command was sent to the cluster/_settings
end-point. You can include multiple properties in a single call. Please remember that the transient
name in the preceding command means that the property will be forgotten after the cluster restart. If you want to avoid this and set this property as a permanent one, use persistent
instead of the transient
one. An example command, which will keep the settings between restarts, could look like this:
curl -XPUT 'localhost:9200/_cluster/settings' -d '{ "persistent" : { "cluster.routing.allocation.require.group": "groupA" } }'
In addition to the previously mentioned properties, we are also allowed to define how many shards (primaries and replicas) for an index can by allocated per node. In order to do that, one should set the index.routing.allocation.total_shards_per_node
property to a desired value. For example, in elasticsearch.yml
we could set this:
index.routing.allocation.total_shards_per_node: 4
This would result in a maximum of four shards per index being allocated to a single node.
This property can also be updated on a live cluster using the Update API, for example, like this:
curl -XPUT 'localhost:9200/mastering/_settings' -d '{ "index.routing.allocation.total_shards_per_node": "4" }'
Now, let's see a few examples of what the cluster would look like when creating a single index and having the allocation properties used in the elasticsearch.yml
file.
One of the properties that can be useful when having multiple nodes on a single physical server is cluster.routing.allocation.same_shard.host
. When set to true
, it prevents Elasticsearch from placing a primary shard and its replica (or replicas) on the same physical host. We really advise that you set this property to true
if you have very powerful servers and that you go for multiple Elasticsearch nodes per physical server.
Now, let's use our example cluster to see how the allocation inclusion works. Let's start by deleting and recreating the mastering
index by using the following commands:
curl -XDELETE 'localhost:9200/mastering' curl -XPOST 'localhost:9200/mastering' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 0 } } }'
After this, let's try to run the following command:
curl -XPUT 'localhost:9200/mastering/_settings' -d '{ "index.routing.allocation.include.tag": "node1", "index.routing.allocation.include.group": "groupA", "index.routing.allocation.total_shards_per_node": 1 }'
If we visualize the response of the index status, we would see that the cluster looks like the one in the following image:
As you can see, the mastering
index shards are allocated to nodes with the tag
property set to node1
or the group
property set to groupA
.
Now, let's reuse our example cluster and try running the following command:
curl -XPUT 'localhost:9200/mastering/_settings' -d '{ "index.routing.allocation.require.tag": "node1", "index.routing.allocation.require.group": "groupA" }'
If we visualize the response of the index status command, we would see that the cluster looks like this:
As you can see, the view is different than the one when using include
. This is because we tell Elasticsearch to allocate shards of the mastering
index only to the nodes that match both the require
parameters, and in our case, the only node that matches both is the first node.
Let's now look at exclusions. To test it, we try to run the following command:
curl -XPUT 'localhost:9200/mastering/_settings' -d '{ "index.routing.allocation.exclude.tag": "node1", "index.routing.allocation.require.group": "groupA" }'
Again, let's look at our cluster now:
As you can see, we said that we require the group
property to be equal to groupA
, and we want to exclude the node with a tag
equal to node1
. This resulted in the shard of the mastering
index being allocated to the node with the 192.168.2.2
IP address, which is what we wanted.
Of course, the mentioned properties are not the only ones that can be used. With the release of Elasticsearch 1.3.0 we got the ability to configure awareness on the basis of the disk usage. By default, disk-based allocation is turned on, and if we want, we can turn it off by setting the cluster.routing.allocation.disk.threshold_enabled
property to false
.
There are three additional properties that can help us configure disk-based allocation. The cluster.routing.allocation.disk.watermark.low
cluster controls when Elasticsearch does not allow you to allocate new shards on the node. By default, it is set to 85 percent and it means that when the disk usage is equal or higher than 85 percent, no new shards will be allocated on that node. The second property is cluster.routing.allocation.disk.watermark.high
, which controls when Elasticsearch will try to move the shards out of the node and is set to 90 percent by default. This means that Elasticsearch will try to move the shard out of the node if the disk usage is 90
percent or higher.
Both cluster.routing.allocation.disk.watermark.low
and cluster.routing.allocation.disk.watermark.high
can be set to absolute values, for example, 1024mb
.