Validating your queries

There are times when you are not in total control of the queries that you send to Elasticsearch. The queries can be generated from multiple criteria making them a monster or even worse. They can be generated by some kind of a wizard which makes it hard to troubleshoot and find the part that is faulty and making the query fail. Because of such use cases, Elasticsearch exposes the Validate API, which helps us validate our queries and diagnose potential problems.

Using the Validate API

The usage of the Validate API is very simple. Instead of sending the query to the _search REST endpoint, we send it to the _validate/query one. And that's it. Let's look at the following command:

curl -XGET 'localhost:9200/library/_validate/query?pretty' --data-binary '{
 "query" : {
  "bool" : {
    "must" : {
      "term" : {
        "title" : "crime"
      }
    },
    "should" : {
      "range : {
        "year" : {
          "from" : 1900,
          "to" : 2000
        }
      }
    },
    "must_not" : {
      "term" : {
        "otitle" : "nothing"
      }
    }
  }
 }
}'

A similar query was already used in this book in Chapter 3, Searching Your Data. The preceding command will tell Elasticsearch to validate it and return the information about its validity. The response of Elasticsearch to the preceding command will be similar to the following one:

{
  "valid" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}

Look at the valid attribute. It is set to false. Something went wrong. Let's execute the query validation once again with the explain parameter added in the query:

curl -XGET 'localhost:9200/library/_validate/query?pretty&explain' --data-binary '{
 "query" : {
  "bool" : {
    "must" : {
      "term" : {
        "title" : "crime"
      }
    },
    "should" : {
      "range : {
        "year" : {
          "from" : 1900,
          "to" : 2000
        }
      }
    },
    "must_not" : {
      "term" : {
        "otitle" : "nothing"
      }
    }
  }
 }
}'

Now the result returned from Elasticsearch is more verbose:

{
  "valid" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "explanations" : [ {
    "index" : "library",
    "valid" : false,
    "error" : "[library] QueryParsingException[Failed to parse]; nested: JsonParseException[Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in name
 at [Source: org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090; line: 10, column: 18]];; com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in name
 at [Source: org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090; line: 10, column: 18]"
  } ]
}

Now everything is clear. In our example, we have improperly quoted the range attribute.

Note

You may wonder why in our curl query we used the --data-binary parameter. This parameter properly preserves the new line character when sending a query to Elasticsearch. This means that the line and the column number remain intact and it's easier to find errors. In the other cases, the –d parameter is more convenient because it's shorter.

The Validate API can also detect other errors, for example, incorrect format of a number or other mapping-related issues. Unfortunately, for our application, it is not easy to detect what the problem is because of a lack of structure in the error messages.

The Validate API supports most of the parameters that are supported by standard Elasticsearch queries, which include: explain, ignore_unavailable, allow_no_indices, expand_wildcards, operation_threading, analyzer, analyze_wildcard, default_operator, df, lenient, lowercase_expanded_terms, and rewrite.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset