So far we've run our queries and got the results in the order determined by the score of each document. However, it is not enough for all the use cases. It is really handy to be able to sort our results on the basis of the field values. For example, when you are searching logs or time-based data in general, you probably want to have the most recent data first. In addition to that, Elasticsearch allows us to control how the document such be sorted not only using field values, but also using more sophisticated sorting like ones that use scripts or sorting on fields that have multiple values. We will cover all that in this section.
Let's look at the following query that returns all the books with at least one of the specified words:
curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "query" : { "terms" : { "title" : [ "crime", "front", "punishment" ] } } }'
Under the hood, we can imagine that Elasticsearch sees the preceding query as follows:
curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "query" : { "terms" : { "title" : [ "crime", "front", "punishment" ] } }, "sort" : { "_score" : "desc" } }'
Look at the highlighted section in the preceding query. This is the default sorting used by Elasticsearch. For better visibility, we can change the formatting slightly and show the highlighted fragment as follows:
"sort" : [ { "_score" : "desc" } ]
The preceding section defines how the documents should be sorted in the results list. In this case, Elasticsearch will show the documents with the highest score on top of the results list. The simplest modification is to reverse the ordering by changing the sort
section to the following one:
"sort" : [ { "_score" : "asc" } ]
Default sorting is boring, isn't it? So, let's change it to sort on the basis of the values of the fields present in the documents. Let's choose the title
field, which means that the sort
section of our query will look as follows:
"sort" : [ { "title" : "asc" } ]
Unfortunately, this doesn't work as expected. Although Elasticsearch sorted the documents, the ordering is somewhat strange. Look closely at the response. With every document, Elasticsearch returns information about the sorting; for example, for the Crime and Punishment
book, the returned document looks like the following code:
{ "_index" : "library", "_type" : "book", "_id" : "4", "_score" : null, "_source" : { "title" : "Crime and Punishment", "otitle" : "Преступлéние и наказáние", "author" : "Fyodor Dostoevsky", "year" : 1886, "characters" : [ "Raskolnikov", "Sofia Semyonovna Marmeladova" ], "tags" : [ ], "copies" : 0, "available" : true }, "sort" : [ "punishment" ] }
If you compare the title
field and the returned sorting information, everything should be clear. Elasticsearch, during the analysis process, splits the field into several tokens. Since sorting is done using a single token, Elasticsearch chooses one of the produced tokens. It does the best that it can by sorting these tokens alphabetically and choosing the first one. This is the reason why, in the sorting value, we find only a single word instead of the whole content of the title
field. If you would like to see how Elasticsearch behaves when using different fields for sorting, you can try fields such as copies
:
curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "query" : { "terms" : { "title" : [ "crime", "front", "punishment" ] } }, "sort" : [ { "copies" : "asc" } ] }'
In general, it is a good idea to have a not analyzed field for sorting. We can use fields with multiple values for sorting, but, in most cases, it doesn't make much sense and has limited usage.
As an example of using two different fields, one for sorting and another for searching, let's change our title
field. The changed title
field definition will look as follows:
"title" : { "type": "string", "fields": { "sort": { "type" : "string", "index": "not_analyzed" } } }
After changing the title
field in the mappings (we've used the same mappings as in Chapter 3, Searching Your Data) and re-indexing the data, we can try sorting the title.sort
field and see whether it works. To do this, we will need to send the following query:
{ "query" : { "match_all" : { } }, "sort" : [ {"title.sort" : "asc" } ] }
Now, it works properly. As you can see, we used the new field, the title.sort
one. We set it as not to be analyzed, so there is a single token for that field in the index of Elasticsearch.
In the response from Elasticsearch, every document contains information about the value used for sorting. For example, let's look at one of the documents returned by the query in which we used the title
field for sorting:
{ "_index" : "library", "_type" : "book", "_id" : "1", "_score" : null, "_source" : { "title" : "All Quiet on the Western Front", "otitle" : "Im Westen nichts Neues", "author" : "Erich Maria Remarque", "year" : 1929, "characters" : [ "Paul Bäumer", "Albert Kropp", "Haie Westhus", "Fredrich Müller", "Stanislaus Katczinsky", "Tjaden" ], "tags" : [ "novel" ], "copies" : 1, "available" : true, "section" : 3 }, "sort" : [ "all" ] }
The sorting used in the query to get the preceding document, was as follows:
"sort" : [ { "title" : "asc" } ]
However, because we are sorting on an analyzed field, which contains more than a single value, the sorting definition is in fact equivalent to the longer form, which looks as follows:
"sort" : [ { "title" : { "order" : "asc", "mode" : "min" } ]
mode
defines which token should be used for comparison when sorting on a field which has more than one value. The available values we can choose from are:
min
: Sorting will use the lowest value (or the first alphabetical value on the text based fields)max
: Sorting will use the highest value (or the last alphabetical value on the text based fields)avg
: Sorting will use the average valuemedian
: Sorting will use the median valuesum
: Sorting will use the sum of all the values in the fieldNote that sort
, in request and response, is given as an array. This suggests that we can use several different orderings. Elasticsearch will use the next element in the sorting definition list to determine ordering between the documents that have the same value of the previous sorting clause. So, if we have the same value in the title
field, the documents will be sorted by the next field that we specify. For example, if we would like to get the documents that have the most copies and then sort by the title, we will run the following query:
curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "query" : { "terms" : { "title" : [ "crime", "front", "punishment" ] } }, "sort" : [ { "copies" : "desc" }, { "title" : "asc" } ] }'
What about when some of the documents that match the query don't have the field we want to sort on? By default, documents without the given field are returned first in the case of ascending order and last in the case of descending order. However, sometimes this is not exactly what we want to achieve.
When we use sorting on numeric fields, we can change the default Elasticsearch behavior for documents with missing fields. For example, let's take a look at the following query:
curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "query" : { "match_all" : { } }, "sort" : [ { "section" : { "order" : "asc", "missing" : "_last" } } ] }'
Note the extended form of the sort
section of our query. We've added the missing
parameter to it. By setting the missing
parameter to _last
, Elasticsearch will place the documents without the given field at the bottom of the results list. Setting the missing
parameter to _first
will result in Elasticsearch placing documents without the given field at the top of the results list. It is worth mentioning that besides the _last
and _first
values, Elasticsearch also allows us to use any number. In such a case, a document without a defined field will be treated as the document with this given value.
As we mentioned in the previous section, Elasticsearch allows us to sort using fields that have multiple values. We can control how the comparison is made using scripts. We do that by showing Elasticsearch how to calculate the value that should be used for sorting. Let's assume that we want to sort by the first value indexed in the tags
field. Let's take a look at the following example query (note that running the following query requires the script.inline
property set to on
in the elasticsearch.yml
file):
curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "query" : { "match_all" : { } }, "sort" : { "_script" : { "script" : "doc["tags"].values.size() > 0 ? doc["tags"].values[0] : "u19999"", "type" : "string", "order" : "asc" } } }'
In the preceding example, we replaced every nonexistent value with the Unicode code of a character that should be low enough in the list. The main idea of this code is to check if our array contains at least a single element. If it does, then the first value from the array is returned. If the array is empty, we return the Unicode character that should be placed at the bottom of the results list. Besides the script
parameter, this option of sorting requires us to specify the order
(ascending, in our case) and type
parameters that will be used for the comparison (we return string
from our script).
By default, Elasticsearch assumes that when you use sorting, the score is completely unimportant. Usually it is a good assumption; why do additional computations when the importance of the documents is given by the sorting formula. Sometimes, however, you want to know how good the document is in relation to the current query, even if the documents are presented in a different order. This is when the track_scores
parameter should be used and set to true
. An example query using it looks as follows:
curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "query" : { "match_all" : { } }, "track_scores" : true, "sort" : [ { "title" : { "order" : "asc" }} ] }'
The preceding query calculates the score for every document. In fact, in our example, the score is boring and is always equal to 1.0
because of the match_all
query which treats all the documents as equal.