Data in Elasticsearch is by default sorted by a relevance score, which is computed using the Lucene scoring formula, TF/IDF. This relevance score is a floating point value that is returned with search results inside the _score
parameter. By default, results are sorted in descending order.
See the following query for an example:
{ "query": { "match": { "text": "data analytics" } } }
We are searching for tweets that contain the data
or analytics
terms in their text fields. In some cases, however, we do not want the results to be sorted based on _score
. Elasticsearch provides a way to sort documents in various ways. Let's explore how this can be done.
This section covers the sorting of documents based on the fields that contain a single value such as created_at
, or followers_count
. Please note that we are not talking about sorting string-based fields here.
Suppose we want to sort tweets that contain data
or analytics
in their text field based on their creation time in ascending order:
{ "query":{ "match":{"text":"data analytics"} }, "sort":[ {"created_at":{"order":"asc"}} ] }
In the response of the preceding query, max_score
and _score
will have null as values. They are not calculated because _score
is not used for sorting. You will see an additional field, sort
. This field contains the date value in the long format, which has been used for sorting.
In scenarios where it is required to sort documents based on more than one field, one can use the following syntax for sorting:
"sort": [ {"created_at":{"order":"asc"},"followers_count":{"order":"asc"}} ]
With the above query, the results will be sorted first using tweet creation time, and if two tweets have the same tweet creation time, then they will be sorted using the followers count.
Multivalued fields such as arrays of dates contain more than one value, and you cannot specify on which value to sort. So in this case, the single value needs to be calculated first using mode
parameter that takes min
, max
, avg
, median
, or sum
as a value. For example, in the following query the sorting will be done on the maximum value inside the price
field of each document:
"sort" : [ {"price" : {"order" : "asc", "mode" : "max"}} ]
The analyzed
string fields are also multivalued fields since they contain multiple tokens and because of performance considerations; do not use sorting on analyzed fields.
The string field on which sorting is to be done must be not_analyzed
or keyword tokenized so that the field contains only one single token.
Sorting is an expensive process. All the values for the field on which sorting is to be performed are loaded into memory. So, you should have an ample amount of memory on the node to perform sorting. The data type of the field should also be chosen carefully while creating mapping. For example, short
can be used in place of integer
or long
if the value is not going to be bigger.