Using nested objects

Nested objects can come in handy in certain situations. Basically, with nested objects Elasticsearch allows us to connect multiple documents together – one main document and multiple dependent ones. The main document and the nested ones are indexed together and they are placed in the same segment of the index (actually, in the same block inside the segment, near each other), which guarantees the best performance we can get for such a data structure. The same goes for changing the document; unless you are using the update API, you need to index the parent document and all the other nested ones at the same time.

Note

If you would like to read more about how nested objects work on the Apache Lucene level, there is a very good blog post written by Mike McCandless at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html.

Now let's get on with our example use case. Imagine that we have a shop with clothes and we store the size and color of each t-shirt. Our standard, non-nested mappings will look like this (stored in cloth.json):

{
 "cloth" : {
  "properties" : {
   "name" : {"type" : "string"},
   "size" : {"type" : "string", "index" : "not_analyzed"},
   "color" : {"type" : "string", "index" : "not_analyzed"}
  }
 }
}

To create the shop index without cloth mapping, we run the following commands:

curl -XPOST 'localhost:9200/shop'
curl -XPUT 'localhost:9200/shop/cloth/_mapping' -d @cloth.json

Now imagine that we have a t-shirt in our shop that we only have in XXL size in red and in XL size in black. So our example document indexation command will look as follows:

curl -XPOST 'localhost:9200/shop/cloth/1' -d '{
 "name" : "Test shirt",
 "size" : [ "XXL", "XL" ],
 "color" : [ "red", "black" ]
}'

However, there is a problem with such a data structure. What if one of our clients searches our shop in order to find the XXL t-shirt in black? Let's check that by running the following query (we assume that we've used our mappings to create the index and we've indexed our example document):

curl -XGET 'localhost:9200/shop/cloth/_search?pretty=true' -d '{
  "query" : {
  "bool" : {
  "must" : [
    {
     "term" : { "size" : "XXL" }
    },
    {
     "term" : { "color" : "black" }
    }
    ]
  }
  }
}'

We should get no results, right? But in fact Elasticsearch returned the following document:

{
  (…)
  "hits" : {
    "total" : 1,
    "max_score" : 0.4339554,
    "hits" : [ {
      "_index" : "shop",
      "_type" : "cloth",
      "_id" : "1",
      "_score" : 0.4339554,
      "_source" : {
        "name" : "Test shirt",
        "size" : [ "XXL", "XL" ],
        "color" : [ "red", "black" ]
      }
    } ]
  }
}

This is because the document was matched – we have the values we are searching for in the size field and in the color field. Of course, this is not what we would like to get.

So, let's modify our mappings to use the nested objects to separate color and size to different nested documents. The final mapping looks as follows (we store these mappings in the cloth_nested.json file):

{
 "cloth" : {
  "properties" : {
   "name" : {"type" : "string", "index" : "analyzed"},
   "variation" : {
    "type" : "nested",
    "properties" : {
     "size" : {"type" : "string", "index" : "not_analyzed"},
     "color" : {"type" : "string", "index" : "not_analyzed"}
    }
   }
  }
 }
}

Now, we will create a second index called shop_nested using our modified mappings by running the following commands:

curl -XPOST 'localhost:9200/shop_nested'
curl -XPUT 'localhost:9200/shop_nested/cloth/_mapping' -d @cloth_nested.json

As you can see, we've introduced a new object inside our cloth type – variation one, which is a nested one (the type property set to nested). It basically says that we will want to index the nested documents. Now, let's modify our document. We will add the variation object to it and that object will store the objects with two properties – size and color. So the index command for our modified example product will look like the following:

curl -XPOST 'localhost:9200/shop_nested/cloth/1' -d '{
  "name" : "Test shirt",
  "variation" : [
  { "size" : "XXL", "color" : "red" },
  { "size" : "XL", "color" : "black" }
  ]
}'

We've structured the document so that each size and its matching color is a separate document. However, if you run our previous query, it won't return any documents. This is because in order to query for nested documents, we need to use a specialized query. So now our query looks as follows:

curl -XGET 'localhost:9200/shop_nested/cloth/_search?pretty=true' -d '{
  "query" : {
  "nested" : {
   "path" : "variation",
   "query" : {
    "bool" : {
     "must" : [
      { "term" : { "variation.size" : "XXL" } },
      { "term" : { "variation.color" : "black" } }
     ]
    }
    }
  }
  }
}'

And now, the preceding query will not return the indexed document, because we don't have a nested document that has the size equal to XXL and color black.

Let's get back to the query for a second to discuss it briefly. As you can see, we used the nested query in order to search in the nested documents. The path property specifies the name of the nested object (yes, we can have multiple of them). We just included a standard query section under the nested type. Also note that we specified the full path for the field names in the nested objects, which is handy when you have multilevel nesting, which is also possible.

Scoring and nested queries

There is one additional property when it comes to handling nested documents during query. In addition to the path property, there is the score_mode property, which allows us to define how the scoring is calculated from the nested queries. Elasticsearch allows us to set the score_mode property to one of the following values:

  • avg: This is the default value. Using it for the score_mode property will result in Elasticsearch taking the average value calculated from the scores of the defined nested queries. Calculated average will be included in the score of the main query.
  • sum: Using this value for the score_mode property will result in Elasticsearch taking a sum of the scores for each nested query and including it in the score of the main query.
  • min: Using this value for the score_mode property will result in Elasticsearch taking the score of the minimum scoring nested query and including it in the score of the main query.
  • max: Using this value for the score_mode property will result in Elasticsearch taking the score of the maximum scoring nested query and including it in the score of the main query.
  • none: Using this value for the score_mode property will result in no score being taken from the nested query.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset