When it is too much for I/O – throttling explained

In the Choosing the right directory implementation section, we've talked about the store type, which means we are now able to configure the store module to match our needs. However, we didn't write everything about the store module—we didn't write about throttling.

Controlling I/O throttling

As you remember from the Segment merging under control section, Apache Lucene stores the data in immutable segment files that can be read many times but can be written only once. The merge process is asynchronous and, in general, it should not interfere with indexing and searching, looking from the Lucene point of view. However, problems may occur because merging is expensive when it comes to I/O—it requires you to read the segments that are going to be merged and write new ones. If searching and indexing happen concurrently, this can be too much for the I/O subsystem, especially on systems with low I/O. This is where throttling kicks in—we can control how much I/O Elasticsearch will use.

Configuration

Throttling can be configured both on a node-level and on the index-level, so you can either configure how many resources a node will use or how many resources will be used for the index.

The throttling type

In order to configure the throttling type on the node-level, one should use the indices.store.throttle.type property, which can take the value of none, merge, and all. The none value will tell Elasticsearch that no limiting should take place. The merge value tells Elasticsearch that we want to limit the I/O usage for the merging of nodes (and it is the default value) and the all value specifies that we want to limit all store module-based operations.

In order to configure the throttling type on the index-level, one should use the index.store.throttle.type property, which can take the same values as the indices.store.throttle.type property with an additional one— node. The node value will tell Elasticsearch that instead of using per-index throttling limiting, we will use the node-level configuration. This is the default value.

Maximum throughput per second

In both cases, when using index or node-level throttling, we are able to set the maximum bytes per second that I/O can use. For the value of this property, we can use 10mb, 500mb, or anything that we need. For the index-level configuration, we should use the index.store.throttle.max_bytes_per_sec property and for the node-level configuration, we should use indices.store.throttle.max_bytes_per_sec.

Note

The previously mentioned properties can be set both in the elasticsearch.yml file and can also be updated dynamically using the cluster update settings for the node-level configuration and using the index update settings for the index-level configuration.

Node throttling defaults

On the node-level, since Elasticsearch 0.90.1, throttling is enabled by default. The indices.store.throttle.type property is set to merge and the indices.store.throttle.max_bytes_per_sec property is set to 20mb. Elasticsearch versions before 0.90.1 don't have throttling enabled by default.

Performance considerations

When using SSD (solid state drives) or when query speed matters only a little (or you are not searching when you index your data), it is worth considering disabling throttling completely. We can do this by setting the indices.store.throttle.type property to none. This causes Elasticsearch to not use any store-level throttling and use full disk throughput for store-based operations.

The configuration example

Now, let's imagine that we have a cluster that consists of four Elasticsearch nodes and we want to configure throttling for the whole cluster. By default, we want the merge operation not to process more than 50 megabytes per second for a node. We know that we can handle such operations without affecting the search performance, and this is what we are aiming at. In order to achieve this, we would run the following request:

curl -XPUT 'localhost:9200/_cluster/settings' -d '{
 "persistent" : {
  "indices.store.throttle.type" : "merge",
  "indices.store.throttle.max_bytes_per_sec" : "50mb"
 }
}'

In addition to this, we have a single index called payments that is very rarely used, and we've placed it in the smallest machine in the cluster. This index doesn't have replicas and is built of a single shard. What we would like to do for this index is limit the merges to process a maximum of 10 megabytes per second. So, in addition to the preceding command, we would run one like this:

curl -XPUT 'localhost:9200/payments/_settings' -d '{
 "index.store.throttle.type" : "merge",
 "index.store.throttle.max_bytes_per_sec" : "10mb"
}'

After running the preceding commands, we can check our index settings by running the following command:

curl -XGET 'localhost:9200/payments/_settings?pretty'

In response, we should get the following JSON:

{
  "payments" : {
    "settings" : {
      "index" : {
        "creation_date" : "1414072648520",
        "store" : {
          "throttle" : {
            "type" : "merge",
            "max_bytes_per_sec" : "10mb"
          }
        },
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "version" : {
          "created" : "1040001"
        },
        "uuid" : "M3lePTOvSN2jnDz1J0t4Uw"
      }
    }
  }
}

As you can see, after updating the index setting, closing the index, and opening it again, we've finally got our settings working.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset