Sorting data

Now we can build quite complex queries for the whole index, or to part of it, by using filters. We can send these queries to ElasticSearch and analyze the returned data. Until now, this data was organized in the order determined by scoring. This is exactly what we want in most cases. Search should give us the most appropriate documents first. But what can we do if we want to use our search more like a database or set a more sophisticated algorithm for data ordering? Let's check what ElasticSearch can do with sorting.

Default sorting

Let's look at the following query, which returns all the books with at least one of the specified words:

{
 "query" : {
    "terms" : {
       "title" : [ "crime", "front", "punishment" ],
       "minimum_match" : 1
    }
  }
}

Under the hood, ElasticSearch sees this as follows:

{
 "query" : {
    "terms" : {
       "title" : [ "crime", "front", "punishment" ],
       "minimum_match" : 1
    }
  },
  "sort" : [
    { "_score" : "desc" }
  ]
}

Note the highlighted section. This is the default sorting used by ElasticSearch. This means that the return matched documents will show the ones with the highest score first. The simplest modification is reversing the ordering using this:

  "sort" : [
    { "_score" : "asc" }
  ]

Selecting fields used for sorting

Default sorting is boring, isn't it? Let's change this into something a bit more engaging:

  "sort" : [
    { "title" : "asc" }
  ]

Unfortunately, this doesn't work. In the server response, you can find JSON with the reason key, where ElasticSearch says:

[Can't sort on string types with more than one value per doc, or more than one token per field]

Of course, ElasticSearch allows adding documents with multiple values in one field, but such fields cannot be used for sorting because the search doesn't know which values should be used to determine the order. Another reason may be that the field is analyzed and divided into multiple tokens. This is what happened in the preceding case. To avoid this, we can add an additional, non-analyzed version of the title field. To do that, let's change our title field to multi_field, which we already discussed. For example, the title field definition could look like this:

"title" : {
  "type": "multi_field",
  "fields": {
    "title": { "type" : "string" },
    "sort": { "type" : "string", "index": "not_analyzed" }
  }
}

After changing the title field in the mappings that we've shown in the beginning of the chapter, we can try sorting on the title.sort field and see if it will work. To do that, we will need to send the following query:

{
 "query" : {
    "match_all" : { }
  },
  "sort" : [
    {"title.sort" : "asc" }
  ]
}

Now, it works properly. In the response from ElasticSearch, every document contains information about the value used for sorting, for example:

      "_index" : "library",
      "_type" : "book",
      "_id" : "1",
      "_score" : null, "_source" : { "title": "All Quiet on the Western Front","otitle": "Im Westen nichts Neues","author": "Erich Maria Remarque","year": 1929,"characters": ["Paul Bäumer", "Albert Kropp", "Haie Westhus", "Fredrich Müller", "Stanislaus Katczinsky", "Tjaden"],"tags": ["novel"],"copies": 1, "available": true, "section" : 3},
      "sort" : [ "All Quiet on the Western Front" ]

Note that sort, in request and response, is given as an array. This suggests that we can use several different orderings. It is true; ElasticSearch will use the next elements in the list to determine ordering between documents having the same previous field value.

Specifying behavior for missing fields

What about ordering? What about when some of the documents that match the query don't have defined the field we want to run the sort on? By default, documents without the given field are returned first in case of ascending order and last in case of descending order. But sometimes, this is not exactly what we want to achieve. when running sort on a numeric field, this can be changed easily. For example:

{
 "query" : {
    "match_all" : { }
  },
  "sort" : [
    { "section" : { "order" : "asc", "missing" : "_last" } }
  ]
}

Note the extended form of defining the field for sorting; it allows adding other parameters, such as missing. It is worth mentioning that, besides the _last and _first values, ElasticSearch allows us to use any number. In such a case, documents without a defined field will be treated as documents with this given value.

You are probably wondering what we can do in the case of fields that aren't numbers. Don't worry, we will try to avoid this problem, although in a less elegant way.

Dynamic criteria

We've promised an example of how to force ElasticSearch to put documents without the defined fields at the bottom of the result list. In order to achieve that, we will show you how ElasticSearch allows one to calculate the value that should be used for sorting. In our example, we are sorting a field that is an array (as we mentioned before, we can't run sort on multiple values), and we assume that we want to run sort by comparing the first element of that array. So let's look at the request:

 {
 "query" : {
    "match_all" : { }
  },
  "sort" : {
      "_script" : {
        "script" : "doc['tags'].values.length > 0 ? doc['tags'].values[0] : 'u19999'",
        "type" : "string",
        "order" : "asc"
      }
  }
}

In the preceding example, we've replaced every nonexistent value by the Unicode code of a character that should be low enough in the list. The main idea of this code is that we check whether our array contains at least a single element. If it does, the first value from the array is returned. If the array is empty, we return the Unicode character that should be placed at the bottom of the results list. Besides the script, this option of sorting requires us to specify the ordering (ascending in our case) and type that will be used for comparison (we return string from our script).

Collation and national characters

If you want to use languages other than English, you can face the problem of incorrect order of characters. It happens because many languages have a different alphabetical order defined. ElasticSearch supports many languages, but proper collation requires an additional plugin. It's easy to install and configure, but we will discuss it in the ElasticSearch plugins section in Chapter 7, Administrating Your Cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset