Chapter 5. Combining Indexing, Analysis, and Search

In the previous chapter, we learned how to improve our user's search experience by influencing a document's score, how to use synonyms, and how to handle multilingual data. We also saw what span queries are and why your document was returned. In this chapter, we will look at the possibility of indexing data that is not flat or that is related to other data. We will also use the index update API to modify already created indices, and we will finally learn how to index data in the most efficient way. By the end of this chapter, you will have learned:

  • How to index tree-like structures
  • How to modify indices with the update API
  • How to use nested objects
  • How to use the parent-child relationship
  • How to fetch data from external systems
  • How to use batch processing to speed up indexing

Indexing tree-like structures

Trees! Trees are everywhere. If you develop a shop application, you probably have categories. If you look at the filesystem, the files and directories are arranged in tree-like structures. This book may also be represented as a tree; chapters contain topics and subtopics. ElasticSearch has functionalities that help us handle tree-like structures. Let's check how we can navigate such data using path_analyzer.

First we'll create a simple mapping:

{
 "settings" : {
  "index" : {
   "analysis" : {
    "analyzer" : {
     "path_analyzer" : {"tokenizer" : "path_hierarchy"}
    }
   }
  }
 },
 "mappings" : {
  "category" : {
   "properties" : {
    "category" : { 
     "type" : "multi_field",
     "fields" : {
      "name" : { "type" : "string", "index" : "not_analyzed" },
      "path" : { "type" : "string", "analyzer" : "path_analyzer", "store" : true }
     }
    }
   }
  }
 }
} 

In order to put those mappings during index creation use the following command:

curl -XPOST 'localhost:9200/path/' --data-binary '...'

The above mappings should be put as the request body.

As you can see, we have configured only one field, that is, category, which represents where in the tree structure our document is placed. The idea is simple; we can show the position in a tree as a path, exactly the same as how files and directories are presented on your hard disk drive. For example, in an automotive shop, we can have /cars/passenger/sport, /cars/passenger/camper, or /cars/delivery_truck/. We will index this value in two ways, namely, as a name without additional processing and as a path using path_analyzer.

Now we will see what ElasticSearch will do with a category path during the analysis process. To see this, we will use the analysis API described in the Why this document was found topic in Chapter 3, Extending Your Structure and Search:

curl -XGET 'localhost:9200/path/_analyze?field=category.path&pretty' -d '/cars/passenger/sport'

And the following results were returned by ElasticSearch:

{
  "tokens" : [ {
    "token" : "/cars",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "/cars/passenger",
    "start_offset" : 0,
    "end_offset" : 15,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "/cars/passenger/sport",
    "start_offset" : 0,
    "end_offset" : 21,
    "type" : "word",
    "position" : 1
  } ]
}

As we can see, our category path, /cars/passenger/sport, was processed by ElasticSearch and it was divided into three tokens. Thanks to that, we can simply find every document that belongs to a given category or its subcategories using the term filter, such as:

{
 "filter" : {
  "term" : { "category.path" : "/cars" }
 }
}

Note that we also have the original value indexed in the category.name field. This is handy when we want to find documents from a particular path, ignoring documents that are deeper in the hierarchy.

As we've seen how tree-like structures can be handled in ElasticSearch, we would like to move on to the next section, that is, index modification with the update API.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset