Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Extending Your Structure and Search

Till now we've learned how to install, configure, and query our ElasticSearch cluster. We also prepared some more sophisticated mappings. We've also used aliasing to make querying easier and in addition to that we used routing to control where the data is placed. In this chapter, we will extend our knowledge of ElasticSearch by looking at how to index data that is not flat, how to handle geographical data, and how to deal with files. We will also learn how to distinguish the text fragment that was matched and how to implement commonly used autocomplete features. By the end of this chapter you will learn:

How to index data that is not flat
How to extend your index with additional data such as time-to-live and document identifier
How to handle highlighting
How to implement the autocomplete feature
How to handle files
How to handle geographical data

Indexing data that is not flat

Not all data is flat like that which we have been using since Chapter 2, Searching Your Data. Of course if we are building our system, which ElasticSearch will be a part of, we can create a structure that is convenient for ElasticSearch. However, it doesn't need to be flat, it can be more object-oriented. Let's see how to create mappings that use fully structured JSON objects.

Data

Let's assume we have the following data (we store it in the file called structured_data.json):

{
 "book" : {
  "author" : {
   "name" : {
    "firstName" : "Fyodor",
    "lastName" : "Dostoevsky"
   }
  },
  "isbn" : "123456789",
  "englishTitle" : "Crime and Punishment",
  "originalTitle" : "Преступлéние и наказáние",
  "year" : 1886,
  "characters" : [
   {
    "name" : "Raskolnikov"
   }, 
   {
    "name" : "Sofia"
   }
  ],
  "copies" : 0
 }
}

As you can see, the data is not flat. It contains arrays and nested objects, so we can't use our mappings that we used previously. But we can create mappings that will be able to handle such data.

Objects

The previous example data shows a structured JSON file. As you can see, the root object in our file is book. The root object is a special one, which allows us to define additional properties. The book root object has some simple properties such as englishTitle, originalTitle, and so on. Those will be indexed as normal fields in the index. In addition to that it has the characters array type, which we will discuss in the next paragraph. For now, let's focus on author. As you can see, author is an object that has another object nested in it, that is, the name object, which has two properties firstName and lastName.

Arrays

We have already used array type data, but we didn't talk about it. By default all fields in Lucene and thus in ElasticSearch are multivalued, which means that they can store multiple values. In order to send such fields for indexing to ElasticSearch we use the JSON array type, which is nested within the opening and closing square brackets []. As you can see in the previous example, we used the array type for characters property for our book.

Mappings

So, what can we do to index such data as that shown previously? To index arrays we don't need to do anything, we just specify the properties for such fields inside the array name. So in our case in order to index the characters data present in the data we would need to add such mappings as these:

"characters" : {
 "properties" : {
  "name" : {"type" : "string", "store" : "yes"}
 }
}

Nothing strange, we just nest the properties section inside the array's name (which is characters in our case) and we define fields there. As a result of this mapping, we would get the characters.name multivalued field in the index.

We perform similar steps for our author object. We call the section by the same name as is present in the data, but in addition to the properties section we also tell ElasticSearch that it should expect the object type by adding the type property with the value object. We have the author object, but it also has the name object nested in it, so we do the same; we just nest another object inside it. So, our mappings for that would look like the following code:

"author" : {
 "type" : "object",
 "properties" : {
  "name" : {
   "type" : "object",
   "properties" : {
    "firstName" : {"type" : "string", "store" : "yes"},
    "lastName" : {"type" : "string", "store" : "yes"}
   }
  }
 }
}

The firstName and lastName fields would appear in the index as author.name.firstName and author.name.lastName. We will check if that is true in just a second.

The rest of the fields are simple core types, so I'll skip discussing them as they were already discussed in the Schema mapping section of Chapter 1, Getting Started with ElasticSearch Cluster.

Final mappings

So our final mappings file that we've called structured_mapping.json looks like the following:

{
 "book" : {
  "properties" : {
   "author" : {
    "type" : "object",
    "properties" : {
     "name" : {
      "type" : "object",
      "properties" : {
       "firstName" : {"type" : "string", "store" : "yes"},
       "lastName" : {"type" : "string", "store" : "yes"}
      }
     }
    }
   },
   "isbn" : {"type" : "string", "store" : "yes"},
   "englishTitle" : {"type" : "string", "store" : "yes"},
   "originalTitle" : {"type" : "string", "store" : "yes"},
   "year" : {"type" : "integer", "store" : "yes"},
   "characters" : {
    "properties" : {
     "name" : {"type" : "string", "store" : "yes"}
    }
   },
   "copies" : {"type" : "integer", "store" : "yes"}
  }
 }
}

To be or not to be dynamic

As we already know, ElasticSearch is schemaless, which means that it can index data without the need of first creating the mappings (although we should do so if we want to control the index structure). The dynamic behavior of ElasticSearch is turned on by default, but there may be situations where you may want to turn it off for some parts of your index. In order to do that, one should add the dynamic property set to false on the same level of nesting as the type property for the object that shouldn't be dynamic. For example, if we would like our author and name objects not to be dynamic, we should modify the relevant parts of the mappings file so that it looks like the following code:

"author" : {
 "type" : "object",
 "dynamic" : false,
 "properties" : {
  "name" : {
   "type" : "object",
   "dynamic" : false,
   "properties" : {
    "firstName" : {"type" : "string", "store" : "yes"},
    "lastName" : {"type" : "string", "store" : "yes"}
   }
  }
 }
}

However, please remember that in order to add new fields for such objects, we would have to update the mappings.

Note

You can also turn off the dynamic mapping functionality by adding the index.mapper.dynamic : false property to your elasticsearch.yml configuration file.

Sending the mappings to ElasticSearch

The last thing I would like to do is test if all the work we did actually works. This time we will use a slightly different technique of creating an index and adding the mappings. First, let's create the library index with the following command:

curl -XPUT 'localhost:9200/library'

Now, let's send our mappings for the book type:

curl -XPUT 'localhost:9200/library/book/_mapping' -d @structured_mapping.json

Now we can index our example data:

curl -XPOST 'localhost:9200/library/book/1' -d @structured_data.json

If we would like to see how our data was indexed, we can run a query like the following:

curl -XGET 'localhost:9200/library/book/_search?q=*:*&fields=*&pretty=true'

It will return the following data:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "1",
      "_score" : 1.0,
      "fields" : {
        "copies" : 0,
        "characters.name" : [ "Raskolnikov", "Sofia" ],
        "englishTitle" : "Crime and Punishment",
        "author.name.lastName" : "Dostoevsky",
        "isbn" : "123456789",
        "originalTitle" : "Преступлéние и наказáние",
        "year" : 1886,
        "author.name.firstName" : "Fyodor"
      }
    } ]
  }
}

As you can see, all the fields from arrays and object types are indexed properly. Please notice that there is, for example, the author.name.firstName field present, because ElasticSearch did flatten the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. Extending Your Structure and Search

Create new playlist

Sign In

Sign Up