As we have already discussed, in addition to simple queries, ElasticSearch exposes a few compound queries that can be used to connect multiple queries together or are used to control the behavior of another query. You may wonder whether you need such functionality. In fact, if you are interested in making your search better, you'll use the following queries somewhere in your journey with ElasticSearch. A simple example is combining a simple term query with a phrase query in order to get better search results. But for now, let's stick to the query description.
A bool query allows us to wrap a virtually unbounded number of queries and connect them with a logical value by using one of the following sections:
should
: The query wrapped into this section may or may not have a match (the number of the queries in the should
section that need to match is controlled by the minimum_should_match
parameter).must
: The query wrapped into this section must match in order for the document to be returned.must_not
: The query wrapped into this section must not match in order for the document to be returned.Each of these sections can be present multiple times. Also, please remember that the score of the resulting document will be calculated as a sum of all the wrapped queries that the document matched. In addition to the preceding sections, we can add the following parameters to the query body:
boost
: This specifies the boost used with the query; it defaults to 1.0
.minimum_should_match
: This integer value describes the minimum number of should clauses that have to match in order for the checked document to be counted as a match.disable_coord
: This parameter defaults to false
and allows us to enable or disable the score factor computation that is based on the fraction of all query terms that a document contains.Imagine that we would like to find all the documents that have the term crime
in the title
field. In addition, they may or may not have a range of 1900 to 2000 in the year
field and must not have the term "nothing" in the otitle
field. Such a query made with the bool
query may look like the following code:
{ "query" : { "bool" : { "must" : { "term" : { "title" : "crime" } }, "should" : { "range" : { "year" : { "from" : 1900, "to" : 2000 } } }, "must_not" : { "term" : { "otitle" : "nothing" } } } } }
The boosting query
is designed to wrap around two queries and lower the score of the documents that were returned by one of the queries. There are three sections of the boosting query that need to be defined—the positive
section, which should hold the query whose document score will be left unchanged, the negative
section whose resulting documents will have their score lowered, and the negative_boost
section, which holds the boost value that will be used to lower the second section's query score. The advantage of the boosting query is that the results of both queries will be present in the results although some of them will have their score lowered. For example, if we used the bool
query with the must_not
section, we wouldn't get the results for such a query.
Let's see some examples. Say that we would like to have the results of a simple term query for the term crime
in the title
field and that we would like the score of such documents to not be changed. But say also that we would like to have the documents that range from 1800 to 1900 in the year
field and that we would like the scores of documents returned by such a query to have an additional boost of 0.5
. Combining these specifications, we arrive at a query that looks like this:
{ "query" : { "boosting" : { "positive" : { "term" : { "title" : "crime" } }, "negative" : { "range" : { "year" : { "from" : 1800, "to" : 1900 } } }, "negative_boost" : 0.5 } } }
A constant score query
is used to wrap another query (or filter) and return a constant score for each document returned by the wrapped query (or filter). It allows us to strictly control the score value assigned for a document matched by a query or filter. For example, if we want to have a score of 2.0
for all the documents that have the term "crime" in the title
field, we can send the following query:
{ "query" : { "constant_score" : { "query" : { "term" : { "title" : "crime" } }, "boost" : 2.0 } } }
This functionality
is useful when executing a query against multiple indices. It allows us to provide an array of indices (the indices
property) and two queries—one that will be executed if we query the index from the list (the query
property) and one that will be executed on all the other indices (the no_match_query
property). For example, let's assume that we have an alias name, books
, holding two indices—library
and a new one called users
— and we want to use that alias but we want to run different queries to those indices. To do that, we will send the following query:
{ "query" : { "indices" : { "indices" : [ "library" ], "query" : { "term" : { "title" : "crime" } }, "no_match_query" : { "term" : { "user" : "crime" } } } } }
In the preceding query, the query described in the query
property would be run against the library
index, and no_match_query
would be run against all the other indices present in the cluster.
The custom filters
score query allows us to wrap a query and filters. It works in such a way that if a document from the wrapped query matches a filter, we can influence the score of such a document with either a boost or a defined script. For example, if we run the match all query and want to use a boost value of 10
for the documents that have crime
in the title
field, and in addition to that, want to set the score of the documents that have values between 1900
and 1950
in the year
field, we will send the following query:
{ "query" : { "custom_filters_score" : { "query" : { "match_all" : {} }, "filters" : [ { "filter" : { "term" : { "title" : "crime" } }, "boost" : 10.0 }, { "filter" : { "range" : { "year" : { "from" : 1900, "to" : 1950 } } }, "script" : "_source.year" } ], "score_mode" : "first" } } }
Let's stop for a bit
and discuss the query structure. At the main level of the custom_filters_score
query, we have three sections, the query
section, which holds the actual query we run, the filters
section, which is an array of ordered filters that will be used to match the documents from the query and modify their score, and the score_mode
section, which we will discuss. The filters
array is built on one or more filter
objects and the boost value or a script used to modify the score of the document that matches the filter. In our case, we used the script (the script
parameter) to calculate the score for the range filter—the document will have a score equal to the value of its year
field.
The score_mode
section allows us to control how the defined filters affect the score of the matched documents. By default, it is set to first
, which means that only the first matching filter will modify the score. The other values are aggregation-based and are as follows:
min
: The score of the document will be influenced by the minimum scoring filtermax
: The score of the document will be influenced by the maximum scoring filtertotal
: The score of the document will be influenced by the sum of the scores of the matching filtersavg
: The score of the document will be influenced by the average of the score of the matching filtersmultiply
: The score of the document will be influenced by the multiplication of the scores of the matching filtersThere is also another parameter in addition to the one mentioned, that is, the max_boost
parameter, which allows one to set the maximum boost value a document can have.
The custom boost factor query
allows us to wrap another query into it and multiply the score of the documents returned by that query by a provided factor. The difference between this and the boost given to queries is that the boost given to a custom boost factor query is not normalized, which can be desired sometimes. So, if we would like to multiply the boost of a simple term query by 10
, we could run a query like this one:
{ "query" : { "custom_boost_factor" : { "query" : { "term" : { "title" : "crime" } }, "boost_factor" : 10.0 } } }
As you can see, in the
query body, we have new sections—the custom_boost_factor
section (which has the query
property nested and holds the actual query) and the boost_factor
section, which holds the boost multiplier).
A custom score query can be used to customize scoring for another query with the use of script
. For example, if we want to add the year
field to the score calculated by our simple term query and multiply it by the value 2 (of course, it doesn't make much sense right?), we could send the following query:
{ "query" : { "custom_score" : { "query" : { "term" : { "title" : "crime" } }, "params" : { "multiply" : 2 }, "script" : "_score * _source.year * multiply" } } }
We wrapped our term query with a custom_score
query. In addition to that, we provided two additional sections—the params
section, which holds additional parameters used in the score calculation script, and the script
section, which holds the actual score calculation script. The value calculated by the script (as the result of multiplication of the score, the year
field, and the multiply
parameter) will be assigned as the score of all the documents that match the query. As you may have noticed, because we don't store the year
field, we get it from _source
.