Query execution preference

Let's forget about the shard placement and how to configure it—at least for a moment. In addition to all the fancy stuff that Elasticsearch allows us to set for shards and replicas, we also have the possibility to specify where our queries (and other operations, for example, the real-time GET) should be executed.

Before we get into the details, let's look at our example cluster:

Query execution preference

As you can see, we have three nodes and a single index called mastering. Our index is divided into two primary shards, and there is one replica for each primary shard.

Introducing the preference parameter

In order to control where the query (and other operations) we are sending will be executed, we can use the preference parameter, which can be set to one of the following values:

  • _primary: Using this property, the operations we are sending will only be executed on primary shards. So, if we send a query against mastering index with the preference parameter set to the _primary value, we would have it executed on the nodes with the names node1 and node2. For example, if you know that your primary shards are in one rack and the replicas are in other racks, you may want to execute the operation on primary shards to avoid network traffic.
  • _primary_first: This option is similar to the _primary value's behavior but with a failover mechanism. If we ran a query against the mastering index with the preference parameter set to the _primary_first value, we would have it executed on the nodes with the names node1 and node2; however, if one (or more) of the primary shards fails, the query will be executed against the other shard, which in our case is allocated to a node named node3. As we said, this is very similar to the _primary value but with additional fallback to replicas if the primary shard is not available for some reason.
  • _local: Elasticsearch will prefer to execute the operation on a local node, if possible. For example, if we send a query to node3 with the preference parameter set to _local, we would end up having that query executed on that node. However, if we send the same query to node2, we would end up with one query executed against the primary shard numbered 1 (which is located on that node) and the second part of the query will be executed against node1 or node3 where the shard numbered 0 resides. This is especially useful while trying to minimize the network latency; while using the _local preference, we ensure that our queries are executed locally whenever possible (for example, when running a client connection from a local node or sending a query to a node).
  • _only_node:wJq0kPSHTHCovjuCsVK0-A: This operation will be only executed against a node with the provided identifier (which is wJq0kPSHTHCovjuCsVK0-A in this case). So in our case, the query would be executed against two replicas located on node3. Please remember that if there aren't enough shards to cover all the index data, the query will be executed against only the shard available in the specified node. For example, if we set the preference parameter to _only_node:6GVd-ktcS2um4uM4AAJQhQ, we would end up having our query executed against a single shard. This can be useful for examples where we know that one of our nodes is more powerful than the other ones and we want some of the queries to be executed only on that node.
  • _prefer_node:wJq0kPSHTHCovjuCsVK0-A: This option sets the preference parameter to _prefer_node: the value followed by a node identifier (which is wJq0kPSHTHCovjuCsVK0-A in our case) will result in Elasticsearch preferring the mentioned node while executing the query, but if some shards are not available on the preferred node, Elasticsearch will send the appropriate query parts to nodes where the shards are available. Similar to the _only_node option, _prefer_node can be used while choosing a particular node, with a fall back to other nodes, however.
  • _shards:0,1: This is the preference value that allows us to identify which shards the operation should be executed against (in our case, it will be all the shards, because we only have shards 0 and 1 in the mastering index). This is the only preference parameter value that can be combined with the other mentioned values. For example, in order to locally execute our query against the 0 and 1 shard, we should concatenate the 0,1 value with _local using the ; character, so the final value of the preference parameter should look like this: 0,1;_local. Allowing us to execute the operation against a single shard can be useful for diagnosis purposes.
  • custom, string value: Setting the _preference parameter to a custom value will guarantee that the query with the same custom value will be executed against the same shards. For example, if we send a query with the _preference parameter set to the mastering_elasticsearch value, we would end up having the query executed against primary shards located on nodes named node1 and node2. If we send another query with the same preference parameter value, then the second query will again be executed against the shards located on nodes named node1 and node2. This functionality can help us in cases where we have different refresh rates and we don't want our users to see different results while repeating requests. There is one more thing missing, which is the default behavior. What Elasticsearch will do by default is that it will randomize the operation between shards and replicas. If we sent many queries, we would end up having the same (or almost the same) number of queries run against each of the shards and replicas.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset