Sometimes, there may be a need to prepare ElasticSearch to handle your queries. Maybe it's because you rely heavily on the field data cache and you want it to be loaded before your production queries arrive or maybe you want to warm up your operating system's I/O cache. Whatever the reason is, ElasticSearch allows us to define the warming queries for our types and indices.
A warming query is nothing more than the usual query stored in a special index in ElasticSearch called _warmer
. Let's assume we have the following query that we want to use for warming up:
{ "query" : { "match_all" : {} }, "facets" : { "warming_facet" : { "terms" : { "field" : "tags" } } } }
In order to store the preceding query as a warming query for our library
index, we will run the following command:
curl -XPUT 'localhost:9200/library/_warmer/tags_warming_query' -d '{ "query" : { "match_all" : {} }, "facets" : { "warming_facet" : { "terms" : { "field" : "tags" } } } }'
The preceding command will register our query as a warming query with the name tags_warming_query
. You can have multiple warming queries for your index, but each of those queries needs to have a unique name.
We can also define warming queries not only for the whole index, but also for the specific types in it. For example, if we want to store our previously shown query as the warming query only for the book
type in the library
index, we will run the preceding command not to the /library/_warmer
URI, but to the /library/book/_warmer
one, so the whole command will be as follows:
curl -XPUT 'localhost:9200/library/book/_warmer/tags_warming_query' -d '{ "query" : { "match_all" : {} }, "facets" : { "warming_facet" : { "terms" : { "field" : "tags" } } } }'
After adding a warming query, before ElasticSearch allows a new segment to be searched on, it will be warmed up by running the defined warming queries on that segment. This allows ElasticSearch and the operating system to cache data and thus speed up searching.
If you are not familiar with the Apache Lucene library, you may not know what a segment is. Lucene divides the index into parts called segments
, which once written can't be changed. Every new commit operation creates a new segment (which is eventually merged if the number of segments is too high), which Lucene uses for search.
In order to get a specific warming query for our index, we just need to know its name. For example, if we want to get the warming query named tags_warming_query
for our library
index, we will run the following command:
curl -XGET 'localhost:9200/library/_warmer/tags_warming_query?pretty=true'
And the result returned by ElasticSearch will be as follows (note that we've used the pretty=true
parameter to make the response easier to read):
{ "library" : { "warmers" : { "tags_warming_query" : { "types" : [ ], "source" : { "query" : { "match_all" : { } }, "facets" : { "warming_facet" : { "terms" : { "field" : "tags" } } } } } } } }
We can also get all the warming queries for the index and type by using the following command:
curl -XGET 'localhost:9200/library/_warmer'
We can also get all the warming queries for a specific type—for example, if we want to get all the warming queries for the library
index and the book
type, we will run the following query:
curl -XGET 'localhost:9200/library/book/_warmer'
And finally, we can also get all the warming queries that start with a given prefix. For example, if we want to get all the warming queries for the library
index that start with the tags
prefix, we will run the following command:
curl -XGET 'localhost:9200/library/_warmer/tags*'
Deleting a warming query is very similar to getting one, but we just need to use the DELETE
HTTP method. Let's look at how to delete a warming query.
In order to delete a specific warming query from our index, we just need to know its name. For example, if we want to delete the warming query named tags_warming_query
for our library
index, we will run the following command:
curl -XDELETE 'localhost:9200/library/_warmer/tags_warming_query'
We can also delete all the warming queries for the index by using the following command:
curl -XDELETE 'localhost:9200/library/_warmer'
And finally, we can also remove all the warming queries that start with a given prefix. For example, if we want to remove all the warming queries for the library
index that start with the tags
prefix, we will run the following command:
curl -XDELETE 'localhost:9200/library/_warmer/tags*'
In order to disable the warming queries totally, but save them in the _warmer
index, one should set the index.warmer.enabled
configuration property to false
(setting this property to true
will result in enabling the warming up functionality). This setting can be either put into the elasticsearch.yml
file or just set using the REST API on a live cluster.
For example, if we want to disable the warming up functionality for the library
index, we will run the following command:
curl -XPUT 'http://localhost:9200/library/_settings' -d '{ "index.warmer.enabled" : false }'
You may ask which queries should be used as the warming queries—typically, you'll want to choose the ones that are expensive to execute and ones that require caches to be populated—so you'll probably want to choose the queries that include faceting and sorting, based on the fields in your index. Those are the usual candidates. However, you may also choose other queries by looking at the logs and finding where your performance is not as great as you want it to be. Such queries may also be perfect candidates for warming up.
For example, let's say that we have the following logging configuration set in the elasticsearch.yml
file:
index.search.slowlog.threshold.query.warn: 10s index.search.slowlog.threshold.query.info: 5s index.search.slowlog.threshold.query.debug: 2s index.search.slowlog.threshold.query.trace: 1s
And we have the following logging level set in the logging.yml
configuration file:
logger: index.search.slowlog: TRACE, index_search_slow_log_file
Notice that the index.search.slowlog.threshold.query.trace
property is set to 1s
and the index.search.slowlog
logging level is set to TRACE
. That means whenever a query is executed for more than one second (on a shard, not in total), it will be logged into the slow log file (the name of which is specified by the index_search_slow_log_file
configuration section of the logging.yml
configuration file). For example, the following can be found in a slow log file:
[2013-01-24 13:33:05,518][TRACE][index.search.slowlog.query] [Local test] [library][1] took[1400.7ms], took_millis[1400], search_type[QUERY_THEN_FETCH], total_shards[32], source[{"query":{"match_all":{}}}], extra_source[]
As you can see, in the preceding log line, we have the query time, search type, and the query source itself, which shows us the executed query.
Of course, the values can be different in your configuration, but the slow log can be a valuable source of queries that are running too long and may need to have some warm up defined—maybe those are parent-child queries and need some identifiers fetched to perform better or maybe you are using a filter that is expensive when executing for the first time?