Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Choosing the right query for the job

In our Elasticsearch Server Second Edition, we described the full query language, the so-called Query DSL provided by Elasticsearch. A JSON structured query language that allows us to virtually build as complex queries as we can imagine. What we didn't talk about is when the queries can be used and when they should be used. For a person who doesn't have much prior experience with a full text search engine, the number of queries exposed by Elasticsearch can be overwhelming and very confusing. Because of that, we decided to extend what we wrote in the second edition of our first Elasticsearch book and show you, the reader, what you can do with Elasticsearch.

We decided to divide the following section into two distinct parts. The first part will try to categorize the queries and tell you what to expect from a query in that category. The second part will show you an example usage of queries from each group and will discuss the differences. Please take into consideration that the following section is not a full reference for the Elasticsearch Query DSL, for such reference please see Elasticsearch Server Second Edition from Packt Publishing or official Elasticsearch documentation available at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html.

Query categorization

Of course, categorizing queries is a hard task and we don't say that the following list of categories is the only correct one. We would even say that if you would ask other Elasticsearch users, they would provide their own categories or say that each query can be assigned to more than a single category. What's funny—they would be right. We also think that there is no single way of categorizing the queries; however, in our opinion, each Elasticsearch query can be assigned to one (or more) of the following categories:

Basic queries: Category that groups queries allowing searching for a part of the index, either in an analyzed or a non-analysed manner. The key point in this category is that you can nest queries inside a basic query. An example of a basic query is the term query.
Compound queries: Category grouping queries that allow us to combine multiple queries or filters inside them, for example a bool or dismax queries.
Not analyzed queries: Category for queries that don't analyze the input and send it as is to Lucene index. An example of such query is the term query.
Full text search queries: Quite a large group of queries supporting full text searching, analysing their content, and possibly providing Lucene query syntax. An example of such query is the match query.
Pattern queries: Group of queries providing support for various wildcards in queries. For example, a prefix query can be assigned to this particular group.
Similarity supporting queries: Group of queries sharing a common feature—support for match of similar words of documents. An example of such query is the fuzzy_like_this or the more_like_this query.
Score altering queries: Very important group of queries, especially when combined with full text searching. This group includes queries that allow us to modify the score calculation during query execution. An example query that we can assign to this group is the function_score query, which we will talk about in detail in Chapter 3, Not Only Full Text Search.
Position aware queries: Queries that allow us to use term position information stored in the index. A very good example of such queries is the span_term query.
Structure aware queries: Group of queries that can work on structured data such as the parent–child documents. An example query from this group is the nested one.

Of course, we didn't talk about the filters at all, but you can use the same logic as for queries, so let's put the filters aside for now. Before going into examples for each type of query, let's briefly describe the purpose of each of the query category.

Basic queries

Queries that are not able to group any other queries, but instead they are used for searching the index only. Queries in this group are usually used as parts of the more complex queries or as single queries sent against Elasticsearch. You can think about those queries as bricks for building structures—more complex queries. For example, when you need to match a certain phrase in a document without any additional requirements, you should look at the basic queries—in such a case, the match query will be a good opportunity for this requirement and it doesn't need to be added by any other query.

Some examples of the queries from basic category are as follows:

Match: A Query (actually multiple types of queries) used when you need a full text search query that will analyze the provided input. Usually, it is used when you need analysis of the provided text, but you don't need full Lucene syntax support. Because this query doesn't go through the query parsing process, it has a low chance of resulting in a parsing error, and because of this it is a good candidate for handling text entered by the user.
match_all: A simple query matching all documents useful for situations when we need all the whole index contents returned for aggregations.
term: A simple, not analyzed query that allows us to search for an exact word. An example use case for the term query is searching against non-analyzed fields, like ones storing tags in our example data. The term query is also used commonly combined with filtering, for example filtering on category field from our example data.

The queries from the complex category are: match, multi_match, common, fuzzy_like_this, fuzzy_like_this_field, geoshape, ids, match_all, query_string, simple_query_string, range, prefix, regexp, span_term, term, terms, wildcard.

Compound queries

Compound queries are the ones that we can use for grouping other queries together and this is their only purpose. If the simple queries were bricks for building houses, the complex queries are joints for those bricks. Because we can create a virtually indefinite level of nesting of the compound queries, we are able to produce very complex queries, and the only thing that limits us is performance.

Some examples of the compound queries and their usage are as follows:

bool: One of the most common compound query that is able to group multiple queries with Boolean logical operator that allows us to control which part of the query must match, which can and which should not match. For example, if we would like to find and group together queries matching different criteria, then the bool query is a good candidate. The bool query should also be used when we want the score of the documents to be a sum of all the scores calculated by the partial queries.
dis_max: A very useful query when we want the score of the document to be mostly associated with the highest boosting partial query, not the sum of all the partial queries (like in the bool query). The dis_max query generates the union of the documents returned by all the subqueries and scores the documents by the simple equation max (score of the matching clauses) + tie_breaker * (sum of scores of all the other clauses that are not max scoring ones). If you want the max scoring subquery to dominate the score of your documents, then the dis_max query is the way to go.

The queries from that category are: bool, boosting, constant_score, dis_max, filtered, function_score, has_child, has_parent, indices, nested, span_first, span_multi, span_first, span_multi, span_near, span_not, span_or, span_term, top_children.

Not analyzed queries

These are queries that are not analyzed and instead the text we provide to them is sent directly to Lucene index. This means that we either need to be aware exactly how the analysis process is done and provide a proper term, or we need to run the searches against the non-analyzed fields. If you plan to use Elasticsearch as NoSQL store this is probably the group of queries you'll be using, they search for the exact terms without analysing them, i.e., with language analyzers.

The following examples should help you understand the purpose of not analyzed queries:

term: When talking about the not analyzed queries, the term query will be the one most commonly used. It provides us with the ability to match documents having a certain value in a field. For example, if we would like to match documents with a certain tag (tags field in our example data), we would use the term query.
Prefix: Another type of query that is not analyzed. The prefix query is commonly used for autocomplete functionality, where the user provides a text and we need to find all the documents having terms that start with the given text. It is good to remember that even though the prefix query is not analyzed, it is rewritten by Elasticsearch so that its execution is fast.

The queries from that category are: common, ids, prefix, span_term, term, terms, wildcard.

Full text search queries

A group that can be used when you are building your Google-like search interface. Those queries analyze the provided input using the information from the mappings, support Lucene query syntax, support scoring capabilities, and so on. In general, if some part of the query you are sending comes from a user entering some text, you'll want to use one of the full text search queries such as the query_string, match or simple_query_string queries.

A Simple example of the full text search queries use case can be as follows:

simple_query_string: A query built on top of Lucene SimpleQueryParser (http://lucene.apache.org/core/4_9_0/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.html) that was designed to parse human readable queries. In general, if you want your queries not to fail when a query parsing error occurs and instead figure out what the user wanted to achieve, this is a good query to consider.

The queries from that category are: match, multi_match, query_string, simple_query_string.

Pattern queries

Elasticsearch provides us with a few queries that can handle wildcards directly or indirectly, for example the wildcard query and the prefix query. In addition to that, we are allowed to use the regexp query that can find documents that have terms matching given patterns.

We've already discussed an example using the prefix query, so let's focus a bit on the regexp query. If you want a query that will find documents having terms matching a certain pattern, then the regexp query is probably the only solution for you. For example, if you store logs in your Elasticsearch indices and you would like to find all the logs that have terms starting with the err prefix, then having any number of characters and ending with memory, the regexp query will be the one to look for. However, remember that all the wildcard queries that have expressions matching large number of terms will be expensive when it comes to performance.

The queries from that category are: prefix, regexp, wildcard.

Similarity supporting queries

We like to think that the similarity supporting queries is a family of queries that allow us to search for similar terms or documents to the one we passed to the query. For example, if we would like to find documents that have terms similar to crimea term, we could run a fuzzy query. Another use case for this group of queries is providing us with "did you mean" like functionality. If we would like to find documents that have titles similar to the input we've provided, we would use the more_like_this query. In general, you would use a query from this group whenever you need to find documents having terms or fields similar to the provided input.

The queries from that category are: fuzzy_like_this, fuzzy_like_this_field, fuzzy, more_like_this, more_like_this_field.

Score altering queries

A group of queries used for improving search precision and relevance. They allow us to modify the score of the returned documents by providing not only a custom boost factor, but also some additional logic. A very good example of a query from this group is the function_score query that provides us with a possibility of using functions, which result in document score modification based on mathematical equations. For example, if you would like the documents that are closer to a given geographical point to be scored higher, then using the function_score query provides you with such a possibility.

The queries from that category are: boosting, constant_score, function_score, indices.

Position aware queries

These are a family of queries that allow us to match not only certain terms but also the information about the terms' positions. The most significant queries from this group are all the span queries in Elasticsearch. We can also say that the match_phrase query can be assigned to this group as it also looks at the position of the indexed terms, at least to some extent. If you want to find groups of words that are a certain distance in the index from other words, like "find me the documents that have mastering and Elasticsearch terms near each other and are followed by second and edition terms no further than three positions away," then span queries is the way to go. However, you should remember that span queries will be removed in future versions of Lucene library and thus from Elasticsearch as well. This is because those queries are resource-intensive and require vast amount of CPU to be properly handled.

The queries from that category are: match_phrase, span_first, span_multi, span_near, span_not, span_or, span_term.

Structure aware queries

The last group of queries is the structure aware queries. The queries that can be assigned to this group are as follows:

nested
has_child
has_parent
top_children

Basically, all the queries that allow us to search inside structured documents and don't require us to flatten the data can be classified as the structure aware queries. If you are looking for a query that will allow you to search inside the children document, nested documents, or for children having certain parents, then you need to use one of the queries that are mentioned in the preceding terms. If you want to handle relationships in the data, this is the group of queries you should look for; however, remember that although Elasticsearch can handle relations, it is still not a relational database.

The use cases

As we already know which groups of queries can be responsible for which tasks and what can we achieve using queries from each group, let's have a look at example use cases for each of the groups so that we can have a better view of what the queries are useful for. Please note that this is not a full and comprehensive guide to all the queries available in Elasticsearch, but instead a simple example of what can be achieved.

Example data

For the purpose of the examples in this section, we've indexed two additional documents to our library index.

First, we need to alter the index structure a bit so that it contains nested documents (we will need them for some queries). To do that, we will run the following command:

curl -XPUT 'http://localhost:9200/library/_mapping/book' -d '{
 "book" : {
  "properties" : {
   "review" : {
    "type" : "nested",
    "properties": {
     "nickname" : { "type" : "string" },
     "text" : { "type" : "string" },
     "stars" : { "type" : "integer" }
    }
   }
  }
 }
}'

The commands used for indexing two additional documents are as follows:

curl -XPOST 'localhost:9200/library/book/5' -d '{
 "title" : "The Sorrows of Young Werther",
  "author" : "Johann Wolfgang von Goethe",
   "available" : true,
    "characters" : ["Werther",
    "Lotte","Albert",
      " Fräulein von B"],
      "copies" : 1,
      "otitle" : "Die Leiden des jungen Werthers",
     "section" : 4,
    "tags" : ["novel", "classics"],
    "year" : 1774,
  "review" : [{"nickname" : "Anna","text" : "Could be good, but not  my style","stars" : 3}]
}'

curl -XPOST 'localhost:9200/library/book/6' -d '{
 "title" : "The Peasants",
  "author" : "Władysław Reymont",
   "available" : true,
   "characters" : ["Maciej Boryna","Jankiel","Jagna Paczesiówna", "Antek Boryna"],
    "copies" : 4,
    "otitle" : "Chłopi",
     "section" : 4,
     "tags" : ["novel", "polish", "classics"],
    "year" : 1904,
  "review" : [{"nickname" : "anonymous","text" : "awsome  book","stars" : 5},{"nickname" : "Jane","text" : "Great book, but  too long","stars" : 4},{"nickname" : "Rick","text" : "Why bother,  when you can find it on the internet","stars" : 3}]
}'

Basic queries use cases

Let's look at simple use cases for the basic queries group.

Searching for values in range

One of the simplest queries that can be run is a query matching documents in a given range of values. Usually, such queries are a part of a larger query or a filter. For example, a query that would return books with the number of copies from 1 to 3 inclusive would look as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "range" : {
   "copies" : {
    "gte" : 1,
    "lte" : 3
   }
  }
 }
}'

Simplified query for multiple terms

Imagine a situation where your users can show a number of tags the books returned by what the query should contain. The thing is that we require only 75 percent of the provided tags to be matched if the number of tags provided by the user is higher than three, and all the provided tags to be matched if the number of tags is three or less. We could run a bool query to allow that, but Elasticsearch provides us with the terms query that we can use to achieve the same requirement. The command that sends such query looks as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "terms" : {
   "tags" : [ "novel", "polish", "classics", "criminal", "new" ],
   "minimum_should_match" : "3<75%"
  }
 }
}'

Compound queries use cases

Let's now see how we can use compound queries to group other queries together.

Boosting some of the matched documents

One of the simplest examples is using the bool query to boost some documents by including not mandatory query part that is used for boosting. For example, if we would like to find all the books that have at least a single copy and boost the ones that are published after 1950, we could use the following query:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "bool" : {
   "must" : [
    { 
     "range" : {
      "copies" : {
       "gte" : 1
      } 
     }
    }
   ],
   "should" : [
    {
     "range" : {
      "year" : {
       "gt" : 1950
      }
     }
    }
   ]
  }
 }
}'

Ignoring lower scoring partial queries

The dis_max query, as we have already covered, allows us to control how influential the lower scoring partial queries are. For example, if we only want to assign the score of the highest scoring partial query for the documents matching crime punishment in the title field or raskolnikov in the characters field, we would run the following query:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "fields" : [ "_id", "_score" ],
 "query" : {
  "dis_max" : {
   "tie_breaker" : 0.0,
   "queries" : [
    {
     "match" : {
      "title" : "crime punishment"
     } 
    },
    {
     "match" : {
      "characters" : "raskolnikov"
     }
    }
   ]
  }
 }
}'

The result for the preceding query should look as follows:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2169777,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_score" : 0.2169777,
      "fields" : {
        "_id" : "4"
      }
    } ]
  }
}

Now let's see the score of the partial queries alone. To do that we will run the partial queries using the following commands:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "fields" : [ "_id", "_score" ],
 "query" : {
  "match" : {
   "title" : "crime punishment"
  }
 }
}'

The response for the preceding query is as follows:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2169777,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_score" : 0.2169777,
      "fields" : {
        "_id" : "4"
      }
    } ]
  }
}

And the next command is as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "fields" : [ "_id", "_score" ],
 "query" : {
  "match" : {
   "characters" : "raskolnikov"
  }
 }
}'

And the response is as follows:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_score" : 0.15342641,
      "fields" : {
        "_id" : "4"
      }
    } ]
  }
}

As you can see, the score of the document returned by our dis_max query is equal to the score of the highest scoring partial query (the first partial query). That is because we've set the tie_breaker property to 0.0.

Not analyzed queries use cases

Let's look at two example use cases for queries that are not processed by any of the defined analyzers.

Limiting results to given tags

One of the simplest examples of the not analyzed query is the term query provided by Elasticsearch. You'll probably very rarely use the term query alone; however, it may be commonly used in compound queries. For example, let's assume that we would like to search for all the books with the novel value in the tags field. To do that, we would run the following command:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "term" : {
   "tags" : "novel"
  }
 }
}'

Efficient query time stopwords handling

Elasticsearch provides the common terms query, which allows us to handle query time stopwords in an efficient way. It divides the query terms into two groups—more important terms and less important terms. The more important terms are the ones that have a lower frequency; the less important terms are the opposite. Elasticsearch first executes the query with important terms and calculates the score for those documents. Then, a second query with the less important terms is executed, but the score is not calculated and thus the query is faster.

For example, the following two queries should be similar in terms of results, but not in terms of score computation. Please also note that to see the differences in scoring we would have to use a larger data sample and not use index time stopwords:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "common" : {
   "title" : {
   "query" : "the western front",
   "cutoff_frequency" : 0.1,
   "low_freq_operator": "and"
   }
  }
 }
}'

And the second query would be as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "bool" : {
   "must" : [
    {
     "term" : { "title" : "western" }
    },
    {
     "term" : { "title" : "front" }
    }
   ],
   "should" : [
    {
     "term" : { "title" : "the" }
    }
   ]
  }
 }
}'

Full text search queries use cases

Full text search is a broad topic and so are the use cases for the full text queries. However, let's look at two simple examples of queries from that group.

Using Lucene query syntax in queries

Sometimes, it is good to be able to use Lucene query syntax as it is. We talked about this syntax in the Lucene query language section in Chapter 1, Introduction to Elasticsearch. For example, if we would like to find books having sorrows and young terms in their title, von goethe phrase in the author field and not having more than five copies we could run the following query:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "query_string" : {
   "query" : "+title:sorrows +title:young +author:"von goethe" - copies:{5 TO *]"
  }
 }
}'

As you can see, we've used the Lucene query syntax to pass all the matching requirements and we've let query parser construct the appropriate query.

Handling user queries without errors

Sometimes, queries coming from users can contain errors. For example, let's look at the following query:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "query_string" : {
   "query" : "+sorrows +young "",
   "default_field" : "title"
  }
 }
}'

The response would contain the following:

"error" : "SearchPhaseExecutionException[Failed to execute phase  [query]

This means that the query was not properly constructed and parse error happened. That's why the simple_query_string query was introduced. It uses a query parser that tries to handle user mistakes and tries to guess how the query should look. Our query using that parser would look as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "simple_query_string" : {
   "query" : "+sorrows +young "",
   "fields" : [ "title" ]
  }
 }
}'

If you run the preceding query, you would see that the proper document has been returned by Elasticsearch, even though the query is not properly constructed.

Pattern queries use cases

There are multiple use cases for the wildcard queries; however, we wanted to show you the following two.

Autocomplete using prefixes

A very common use case provides autocomplete functionality on the indexed data. As we know, the prefix query is not analyzed and works on the basis of terms indexed in the field. So the actual functionality depends on what tokens are produced during indexing. For example, let's assume that we would like to provide autocomplete functionality on any token in the title field and the user provided wes prefix. A query that would match such a requirement looks as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "prefix" : {
   "title" : "wes"
  }
 }
}'

Pattern matching

If we need to match a certain pattern and our analysis chain is not producing tokens that allow us to do so, we can turn into the regexp query. One should remember, though, that this kind of query can be expensive during execution and thus should be avoided. Of course, this is not always possible. One thing to remember is that the performance of the regexp query depends on the chosen regular expression. If you choose a regular expression that will be rewritten into a high number of terms, then performance will suffer.

Let's now see the example usage of the regexp query. Let's assume that we would like to find documents that have a term starting with wat, then followed by two characters and ending with the n character, and those terms should be in the characters field. To match this requirement, we could use a regexp query like the one used in the following command:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "regexp" : {
   "characters" : "wat..n"
  }
 }
}'

Similarity supporting queries use cases

Let's look at a couple of simple use cases about how we can find similar documents and terms.

Finding terms similar to a given one

A very simple example is using the fuzzy query to find documents having a term similar to a given one. For example, if we would like to find all the documents having a value similar to crimea, we could run the following query:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "fuzzy" : {
   "title" : {
    "value" : "crimea",
    "fuzziness" : 3,
    "max_expansions" : 50
   }
  }
 }
}'

Finding documents with similar field values

Another example of similarity queries is a use case when we want to find all the documents having field values similar to what we provided in a query. For example, if we would like to find books having a title similar to the western front battles name, we could run the following query:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "fuzzy_like_this_field" : {
   "title" : {
    "like_text" : "western front battles",
    "max_query_terms" : 5
   }
  }
 }
}'

The result of the preceding query would be as follows:

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0162667,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "1",
      "_score" : 1.0162667,
      "_source":{ "title": "All Quiet on the Western  Front","otitle": "Im Westen nichts Neues","author": "Erich  Maria Remarque","year": 1929,"characters": ["Paul B├Ąumer",  "Albert Kropp", "Haie Westhus", "Fredrich M├╝ller",  "Stanislaus Katczinsky", "Tjaden"],"tags":  ["novel"],"copies": 1,
      "available": true, "section" : 3}
    }, {
      "_index" : "library",
      "_type" : "book",
      "_id" : "5",
      "_score" : 0.4375,
      "_source":{"title" : "The Sorrows of Young Werther","author"  : "Johann Wolfgang von Goethe","available" :  true,"characters" : ["Werther","Lotte","Albert","Fraulein  von B"],"copies" : 1, "otitle" : "Die Leiden des jungen  Werthers","section" : 4,"tags" : ["novel",  "classics"],"year" : 1774,"review" : [{"nickname" :  "Anna","text" : "Could be good, but not my style","stars" :  3}]}
    } ]
  }
}

As you can see, sometimes the results are not as obvious as we would expect (look at the second book title). This is because of what Elasticsearch thinks is similar to each other. In the case of the preceding query, Elasticsearch will take all the terms, run a fuzzy search on them, and choose a number of best differentiating terms for documents matching.

Score altering queries use cases

When it comes to relevancy, Elasticsearch provides us with a few queries that we can use to alter the score as per our need. Of course, in addition to this, most queries allow us to provide boost, which gives us even more control. Let's now look at two example use cases of score altering queries.

Favoring newer books

Let's assume that we would like to favor books that are newer, so that a book from the year 1986 is higher in the results list than a book from 1870. The query that would match that requirement looks as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{ 
 "query" : {
  "function_score" : {
   "query" : {
    "match_all" : {}
   },
   "score_mode" : "multiply",
   "functions" : [
    {
     "gauss" : {
      "year" : {
       "origin" : 2014,
       "scale" : 2014,
       "offset" : 0,
       "decay": 0.5
      }
     }
    }
   ]
  }
 }
}'

We will discuss the function_score query in Chapter 3, Not Only Full Text Search. For now, if you look at the results returned by the preceding query, you can see that the newer the book, the higher in the results it will be.

Decreasing importance of books with certain value

Sometimes, it is good to be able to decrease the importance of certain documents, while still showing them in the results list. For example, we may want to show all books, but put the ones that are not available on the bottom of the results list by lowering their score. We don't want sorting on availability because sometimes use may know what he or she is looking for and the score of a full text search query should be also important. However, if our use case is that we want the books that are not available on the bottom of the results list, we could use the following command to get them:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{ 
 "query" : {
  "boosting" : {
   "positive" : {
    "match_all" : {}
   },
   "negative" : {
    "term" : {
     "available" : false
    }
   },
   "negative_boost" : 0.2
  }
 }
}'

Pattern queries use cases

Not very commonly used because of how resource hungry they are, pattern aware queries allow us to match documents having phrases and terms in the right order. Let's look at some examples.

Matching phrases

The simplest position aware query possible and the most performing one from the queries assigned in this group. For example, a query that would only match document leiden des jungen phrase in the otitle field would look as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "match_phrase" : {
   "otitle" : "leiden des jungen"
  }
 }
}'

Spans, spans everywhere

Of course, the phrase query is very simple when it comes to position handling. What if we would like to run a query to find documents that have des jungen phrase not more than two positions after the die term and just before the werthers term? This can be done with span queries, and the following command shows how such a query could look:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "span_near" : {
   "clauses" : [
    {
     "span_near" : {
      "clauses" : [
       {
        "span_term" : {
         "otitle" : "die"
        }
       },
       {
        "span_near" : {
         "clauses" : [
          {
           "span_term" : {
            "otitle" : "des"
           }
          },
          {
           "span_term" : {
            "otitle" : "jungen"
           }
          }
         ],
         "slop" : 0,
         "in_order" : true
        }
       }
      ],
      "slop" : 2,
      "in_order" : false
     }
    },
    {
     "span_term" : {
      "otitle" : "werthers"
     }
    }
   ],
   "slop" : 0,
   "in_order" : true
  }
 }
}'

Please note that span queries are not analyzed. We can see that by looking at the response of the Explain API. To see that response, we should run the same request body (our query) to the /library/book/5/_explain REST endpoint. The interesting part of the output looks as follows:

"description" : "weight(spanNear([spanNear([otitle:die,  spanNear([otitle:des, otitle:jungen], 0, true)], 2, false),  otitle:werthers], 0, true) in 1) [PerFieldSimilarity], result  of:",

Structure aware queries use cases

When it comes to the nested documents or the parent–child relationship, structure aware queries are the ones that will be needed sooner or later. Let's look at the following two examples of where the structure query can be used.

Returning parent documents having a certain nested document

The first example will be a very simple one. Let's return all the books that have at least a single review that was given four stars or more. The query that does that looks as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "nested" : {
   "path" : "review",
   "query" : {
    "range" : {
     "stars" : {
      "gte" : 4
     }
    }
   }
  }
 }
}'

Affecting parent document score with the score of nested documents

Let's assume that we want to find all the available books that have reviews and let's sort them on the maximum number of stars given in the review. The query that would fill such a requirement looks as follows:

curl -XGET 'localhost:9200/library/_search?pretty' -d '{
 "query" : {
  "nested" : {
   "path" : "review",
   "score_mode" : "max",
   "query" : {
    "function_score" : {
     "query" : { "match_all" : {} },
     "score_mode" : "max",
     "boost_mode" : "replace",
     "field_value_factor" : {
      "field" : "stars",
      "factor" : 1,
      "modifier" : "none"
     }
    }
   }
  }
 }
}'

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Choosing the right query for the job

Create new playlist

Sign In

Sign Up

Choosing the right query for the job

Query categorization

Basic queries

Compound queries

Not analyzed queries

Full text search queries

Pattern queries

Similarity supporting queries

Score altering queries

Position aware queries

Structure aware queries

The use cases

Example data

Basic queries use cases

Searching for values in range

Simplified query for multiple terms

Compound queries use cases

Boosting some of the matched documents

Ignoring lower scoring partial queries

Not analyzed queries use cases

Limiting results to given tags

Efficient query time stopwords handling

Full text search queries use cases

Using Lucene query syntax in queries

Handling user queries without errors

Pattern queries use cases

Autocomplete using prefixes

Pattern matching

Similarity supporting queries use cases

Finding terms similar to a given one

Finding documents with similar field values

Score altering queries use cases

Favoring newer books

Decreasing importance of books with certain value

Pattern queries use cases

Matching phrases

Spans, spans everywhere

Structure aware queries use cases

Returning parent documents having a certain nested document

Affecting parent document score with the score of nested documents

Table of Contents for
Choosing the right query for the job