Using the parent-child relationship

In the previous section, we discussed using Elasticsearch to index the nested documents along with the parent one. However, even though the nested documents are indexed as separate documents in the index, we can't change a single nested document (unless we use the update API). Elasticsearch allows us to have a real parent-child relationship and we will look at it in the following section.

Index structure and data indexing

Let's use the same example that we used when discussing the nested documents – the hypothetical cloth store. What we would like to have is the ability to update the sizes and colors without the need to index the whole parent document after each change. We will see how to achieve that using Elasticsearch parent-child functionality.

Child mappings

First we have to create a child index definition. To create child mappings, we need to add the _parent property with the name of the parent type, which will be cloth in our case. In the children documents, we want to have the size and the color of the cloth. So, the command that will create the shop index and the variation type will look as follows:

curl -XPOST 'localhost:9200/shop'
curl -XPUT 'localhost:9200/shop/variation/_mapping' -d '{
  "variation" : {
    "_parent" : { "type" : "cloth" },
    "properties" : {
      "size" : { "type" : "string", "index" : "not_analyzed" },
      "color" : { "type" : "string", "index" : "not_analyzed" }
    }
   }
}'

And that's all. You don't need to specify which field will be used to connect the child documents to the parent ones. By default, Elasticsearch will use the documents' unique identifier for that. If you remember from the previous chapters, the information about a unique identifier is present in the index by default.

Parent mappings

The only field we need to have in our parent document is name. We don't need anything more than that. So, in order to create our cloth type in the shop index, we will run the following commands:

curl -XPUT 'localhost:9200/shop/cloth/_mapping' -d '{
  "cloth" : {
    "properties" : {
      "name" : { "type" : "string" }
    }
  }
}'

The parent document

Now we are going to index our parent document. As we want to store the information about the size and the color in the child documents, the only thing we need to have in the parent documents is the name. Of course, there is one thing to remember – our parent documents need to be of type cloth, because of the _parent property value in the child mappings. The indexing command for our parent document is very simple and looks as follows:

curl -XPOST 'localhost:9200/shop/cloth/1' -d '{
  "name" : "Test shirt"
}'

If you look at the preceding command, you'll notice that our document will be given the identifier 1.

Child documents

To index the child documents, we need to provide information about the parent document with the use of the parent request parameter. The value of the parent parameter should point to the identifier of the parent document. So, to index two child documents to our parent document, we need to run the following command lines:

curl -XPOST 'localhost:9200/shop/variation/1000?parent=1' -d '{
  "color" : "red",
  "size" : "XXL"
}'
curl -XPOST 'localhost:9200/shop/variation/1001?parent=1' -d '{
  "color" : "black",
  "size" : "XL"
}'

And that's all. We've indexed two additional documents, which are of our variation type, but we've specified that our documents have a parent, the document with an identifier of 1.

Querying

We've indexed our data and now we need to use appropriate queries to match the documents with the data stored in their children. This is because, by default, Elasticsearch searches on the documents without looking at the parent-child relations. For example, the following query will match all three documents that we've indexed (two children and one parent):

curl -XGET 'localhost:9200/shop/_search?q=*&pretty'

This is not what we would like to achieve, at least in most cases. Usually, we are interested in parent documents that have children matching the query. Of course Elasticsearch provides such functionalities with specialized types of queries.

Note

The thing to remember though is that, when running queries against parents, the children documents won't be returned, and vice versa.

Querying data in the child documents

Imagine that we want to get clothes that are of the XXL size and are red. As you recall, the size and the color of the cloth are indexed in the child documents, so we need a specialized has_child query, to check which parent documents have children with the desired size and color. So an example query that matches our requirement looks as follows:

curl -XGET 'localhost:9200/shop/_search?pretty' -d '{
  "query" : {
    "has_child" : {
      "type" : "variation",
      "query" : {
        "bool" : {
          "must" : [
            { "term" : { "size" : "XXL" } },
            { "term" : { "color" : "red" } }
          ]
        }
      }
    }
  }
}'

The query is quite simple; it is of the has_child type, which tells Elasticsearch that we want to search in the child documents. In order to specify which type of children we are interested in, we specify the type property with the name of the child type. The query is provided using the query property. We've used a standard bool query, which we've already discussed. The result of the query will contain only those parent documents that have children matching our bool query. In our case, the single document returned looks as follows:

{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "shop",
      "_type" : "cloth",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "name" : "Test shirt"
      }
    } ]
  }
}

The has_child query allows us to provide additional parameters to control its behavior. Every parent document found may be connected with one or more child documents. This means that every child document can influence the resulting score. By default, the query doesn't care about the children documents, how many of them matched, and what is their content – it only matters if they match the query or not. This can be changed by using the score_mode parameter, which controls the score calculation of the has_child query. The values this parameter can take are:

  • none: The default one, the score generated by the relation is 1.0
  • min: The score is taken from the lowest scored child
  • max: The score is taken from the highest scored child
  • sum: The score is calculated as the sum of the child scores
  • avg: The score is taken as the average of the child scores

Let's see an example:

curl -XGET 'localhost:9200/shop/_search?pretty' -d '{
  "query" : {
    "has_child" : {
      "type" : "variation",
      "score_mode" : "sum",
      "query" : {
        "bool" : {
          "must" : [
            { "term" : { "size" : "XXL" } },
            { "term" : { "color" : "red" } }
          ]
        }
      }
    }
  }
}'

We used sum as score_mode which results in children contributing to the final score of the parent document – the contribution is the sum of scores of every child document matching the query.

And finally, we can limit the number of children documents that need to be matched; we can specify both the maximum number of the children documents allowed to be matched (the max_children property) and the minimum number of children documents (the min_children property) that need to be matched. The query illustrating the usage of these parameters is as follows:

curl -XGET 'localhost:9200/shop/_search?pretty' -d '{
  "query" : {
    "has_child" : {
      "type" : "variation",
      "min_children" : 1,
      "max_children" : 3,
      "query" : {
        "bool" : {
          "must" : [
            { "term" : { "size" : "XXL" } },
            { "term" : { "color" : "red" } }
          ]
        }
      }
    }
  }
}'

Querying data in the parent documents

Sometimes, we are not interested in the parent documents but in the children documents. If you would like to return the child documents that matches a given data in the parent document, Elasticsearch has a query for us – the has_parent query. It is similar to the has_child query; however, instead of the type property, we specify the parent_type property with the value of the parent document type. For example, the following query will return both the child documents that we've indexed, but not the parent document:

curl -XGET 'localhost:9200/shop/_search?pretty' -d '{
  "query" : {
    "has_parent" : {
      "parent_type" : "cloth",
      "query" : {
        "term" : { "name" : "test" }
      }
    }
  }
}'

The response from Elasticsearch will be similar to the following one:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "shop",
      "_type" : "variation",
      "_id" : "1000",
      "_score" : 1.0,
      "_routing" : "1",
      "_parent" : "1",
      "_source" : {
        "color" : "red",
        "size" : "XXL"
      }
    }, {
      "_index" : "shop",
      "_type" : "variation",
      "_id" : "1001",
      "_score" : 1.0,
      "_routing" : "1",
      "_parent" : "1",
      "_source" : {
        "color" : "black",
        "size" : "XL"
      }
    } ]
  }
}

Similar to the has_child query, the has_parent query also gives us the possibility of tuning the score calculation of the query. In this case, score_mode has only two options: none, the default one where the score calculated by the query is equal to 1.0, and score, which calculates the score of the document on the basis of the parent document contents. An example that uses score_mode in the has_parent query looks as follows:

curl -XGET 'localhost:9200/shop/_search?pretty' -d '{
  "query" : {
    "has_parent" : {
      "parent_type" : "cloth",
      "score_mode" : "score",
      "query" : {
        "term" : { "name" : "test" }
      }
    }
  }
}'

The one difference with the previous example is score_mode. If you check the results of these queries, you'll notice only a single difference. The score of all the documents from the first example is 1.0, while the score for the results returned by the preceding query is equal to 0.8784157. In this case, all the documents found have the same score, because they have a common parent document.

Performance considerations

When using Elasticsearch parent-child functionality, you have to be aware of the performance impact that it has. The first thing you need to remember is that the parent and the child documents need to be stored in the same shard in order for the queries to work. If you happen to have a high number of children for a single parent, you may end up with shards not having a similar number of documents. Because of that, your query performance can be lower on one of the nodes, resulting in the whole query being slower. Also, remember that parent-child queries will be slower than ones that run against the documents that don't have a relationship between them. There is a way of speeding up joins for the parent-child queries at the cost of memory by eagerly loading the so called global ordinals; however, we will discuss that method in the Elasticsearch caches section of Chapter 9, Elasticsearch Cluster in Detail.

Finally, the first query will preload and cache the document identifiers using the doc values. This takes time. In order to improve the performance of initial queries that use the parent-child relationship, Warmer API can be used. You can find more information about how to add warming queries to Elasticsearch in the Warming up section of Chapter 10, Administrating Your Cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset