Sometimes, there may be a need to prepare Elasticsearch to handle your queries. Maybe it's because you heavily rely on the field data cache and you want it to be loaded before your production queries arrive, or maybe you want to warm up your operating system's I/O cache so that the data indices files are read from the cache. Whatever the reason, Elasticsearch allows us to use so called warming queries for our types and indices.
A warming query is nothing more than the usual query stored in a special type called _warmer
in Elasticsearch. Let's assume that we have the following query that we want to use for warming up:
curl -XGET localhost:9200/library/_search?pretty -d '{ "query" : { "match_all" : {} }, "aggs" : { "warming_aggs" : { "terms" : { "field" : "tags" } } } }'
To store the preceding query as a warming query for our library
index, we will run the following command:
curl -XPUT 'localhost:9200/library/_warmer/tags_warming_query' -d '{ "query" : { "match_all" : {} }, "aggs" : { "warming_aggs" : { "terms" : { "field" : "tags" } } } }'
The preceding command will register our query as a warming query with the tags_warming_query
name. You can have multiple warming queries for your index, but each of these queries needs to have a unique name.
We can not only define warming queries for the entire index, but also for the specific type in it. For example, to store our previously shown query as the warming query only for the book
type in the library
index, run the preceding command not to the /library/_warmer
URI but to /library/book/_warmer
. So, the entire command will be as follows:
curl -XPUT 'localhost:9200/library/book/_warmer/tags_warming_query' -d '{ "query" : { "match_all" : {} }, "aggs" : { "warming_aggs" : { "terms" : { "field" : "tags" } } } }'
After adding a warming query, before Elasticsearch allows a new segment to be searched on, it will be warmed up by running the defined warming queries on that segment. This allows Elasticsearch and the operating system to cache data and, thus, speed up searching.
Just as we read in the Full text searching section of Chapter 1, Getting Started with Elasticsearch Cluster, Lucene divides the index into parts called segments, which once written can't be changed. Every new commit operation creates a new segment (which is eventually merged if the number of segments is too high), which Lucene uses for searching.
In order to get a specific warming query for our index, we just need to know its name. For example, if we want to get the warming query named as tags_warming_query
for our library
index, we will run the following command:
curl -XGET 'localhost:9200/library/_warmer/tags_warming_query?pretty'
The result returned by Elasticsearch will be as follows:
{ "library" : { "warmers" : { "tags_warming_query" : { "types" : [ "book" ], "source" : { "query" : { "match_all" : { } }, "aggs" : { "warming_aggs" : { "terms" : { "field" : "tags" } } } } } } } }
We can also get all the warming queries for the index and type using the following command:
curl -XGET 'localhost:9200/library/_warmer?pretty'
And finally, we can also get all the warming queries that start with a given prefix. For example, if we want to get all the warming queries for the library
index that start with the tags
prefix, we will run the following command:
curl -XGET 'localhost:9200/library/_warmer/tags*?pretty'
Deleting a warming query is very similar to getting one; we just need to use the DELETE
HTTP method. To delete a specific warming query from our index, we just need to know its name. For example, if we want to delete the warming query named tags_warming_query
for our library
index, we will run the following command:
curl -XDELETE 'localhost:9200/library/_warmer/tags_warming_query'
We can also delete all the warming queries for the index using the following command:
curl -XDELETE 'localhost:9200/library/_warmer/_all'
And finally, we can also remove all the warming queries that start with a given prefix. For example, if we want to remove all the warming queries for the library
index that start with the tags
prefix, we will run the following command:
curl -XDELETE 'localhost:9200/library/_warmer/tags*'
To disable the warming queries totally but to save them in the _warmer
index, you should set the index.warmer.enabled
configuration property to false
(setting this property to true
will result in enabling the warming up functionality). This setting can be either put in the elasticsearch.yml
file or just set using the REST API on a live cluster.
For example, if we want to disable the warming up functionality for the library
index, we will run the following command:
curl -XPUT 'localhost:9200/library/_settings' -d '{ "index.warmer.enabled" : false }'
Finally, we should ask ourselves one question: which queries should be considered as candidates for warming. Typically, you'll want to choose ones that are expensive to execute and ones that require caches to be populated. So you'll probably want to choose queries that include aggregations and sorting based on the fields in your index. This will force the operating system to load the part of the indices that hold the data related to such queries and improve the performance of consecutive queries that are run. In addition to this, parent-child queries and nested queries are also potential candidates for warming. You may also choose other queries by looking at the logs, and finding where your performance is not as great as you want it to be. Such queries may also be perfect candidates for warming up.
For example, let's say that we have the following logging configuration set in the elasticsearch.yml
file:
index.search.slowlog.threshold.query.warn: 10s index.search.slowlog.threshold.query.info: 5s index.search.slowlog.threshold.query.debug: 2s index.search.slowlog.threshold.query.trace: 1s
And we have the following logging level set in the logging.yml
configuration file:
logger: index.search.slowlog: TRACE, index_search_slow_log_file
Notice that the index.search.slowlog.threshold.query.trace
property is set to 1s
and the index.search.slowlog
logging level is set to TRACE
. This means that whenever a query is executed for longer than one second (on a shard, not in total), it will be logged into the slow log file (the name of which is specified by the index _search_slow_log_file
configuration section of the logging.yml
configuration file). For example, the following can be found in a slow log file:
[2015-11-25 19:53:00,248][TRACE][index.search.slowlog.query] took[340000.2ms], took_millis[3400], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"match_all":{}},"aggs":{"warming_aggs":{"terms":{"field":"tags"}}}}], extra_source[],
As you can see, in the preceding log line, we have the query time, search type, and the query source, which shows us the executed query.
Of course, the values can be different in your configuration but the slow log can be a valuable source of the queries that have been running too long and may need to have some warm up defined; maybe these are parent-child queries and need some identifiers to be fetched to perform better, or maybe you are using a filter that is expensive when you execute it for the first time.
There is one thing you should remember: don't overload your Elasticsearch cluster with too many warming queries because you may end up spending too much time in warming up instead of processing your production queries.