Chapter 5. Extending Your Index Structure

We started the previous chapter by learning how to deal with revised filtering in Elasticsearch 2.x and what to expect from it now. We also explored highlighting and how it can help us in improving the users' search experience. We discovered query validation in Elasticsearch and learned the ways of data sorting in Elasticsearch. Finally, we discussed query rewriting and how that affects our queries. By the end of this chapter, you will have learned the following topics:

  • Indexing tree-like structures
  • Indexing data that is not flat
  • Handling document relationships by using nested object and parent–child features
  • Modifying index structure by using Elasticsearch API

Indexing tree-like structures

Trees are everywhere. If you develop an e-commerce shop application, your products will probably be described with the use of categories. The thing about categories is that in most cases they are hierarchical. There are top categories, such as electronics, music, books, and so on. Each of the top level categories can have numerous children categories, such as fiction and science, and those can get even deeper into science fiction, romance, and so on. If you look at the file system, the files and directories are arranged in tree-like structures as well. This book can also be represented as a tree: chapters contain topics and topics are divided into subtopics. So the data around us is arranged into tree-like structures and as you can imagine, Elasticsearch is capable of indexing tree-like structures so that we can represent the data in an easier manner. Let's check how we can navigate through this type of data using path_analyzer.

Data structure

To begin with, let's create a simple index structure by using the following command:

curl -XPUT 'localhost:9200/path?pretty' -d '{
  "settings" : {
    "index" : {
      "analysis" : {
        "analyzer" : {
          "path_analyzer" : { "tokenizer" : "path_hierarchy" }
        }
      }
    }
  },
  "mappings" : {
    "category" : {
      "properties" : {
        "category" : {
          "type" : "string",
          "fields" : {
            "name" : { "type" : "string", "index" : "not_analyzed" },
            "path" : { "type" : "string", "analyzer" : "path_analyzer", "store" : true }
          }
        }
      }
    }
  }
}'

As you can see, we have a single type created – the category type. We will use it to store and index the information about the location of our document in the tree structure. The idea is simple – we can show the location of the document as a path, in the exact same manner as the files and directories are presented on your hard disk drive. For example, in an automotive shop, we can have /cars/passenger/sport, /cars/passenger/camper, or /cars/delivery_truck/. However, to achieve that, we need to index this path in two different ways. First of all, we will use an not analyzed field called name, to store and index paths name in its original form. We will also use a field called path, which will use the path_analyzer analyzer which we've defined to process the path so it is easier to search.

Analysis

Now, let's see what Elasticsearch will do with the category path during the analysis process. To see this, we will use the following command line, which uses the analysis API discussed in the Understanding the explain information section of Chapter 6, Make Your Search Better:

curl -XGET 'localhost:9200/path/_analyze?field=category.path&pretty' -d '/cars/passenger/sport'

The following results will be returned by Elasticsearch:

{
  "tokens" : [ {
    "token" : "/cars",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "/cars/passenger",
    "start_offset" : 0,
    "end_offset" : 15,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "/cars/passenger/sport",
    "start_offset" : 0,
    "end_offset" : 21,
    "type" : "word",
    "position" : 0
  } ]
}

As we can see, our category path /cars/passenger/sport was processed by Elasticsearch and divided into three tokens. Thanks to this, we can simply find every document that belongs to a given category or its subcategories using the term filter. For the example to be complete, let's index a simple document by using the following command:

curl -XPUT 'localhost:9200/path/category/1' -d '{ "category" : "/cars/passenger/sport" }'

An example of using filters is as follows:

curl -XGET 'localhost:9200/path/_search?pretty' -d '{
  "query" : {
    "bool" : {
      "filter" : {
        "term" : {
          "category.path" : "/cars"
        }
      }
    }
  }
}'

Note that we also have the original value indexed in the category.name field. This is handy when we want to find documents from a particular path, ignoring the documents that are deeper in the hierarchy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset