Geo

Search servers such as ElasticSearch are usually looked at from the perspective of full text search. This is only partially true. Sometimes the text search is not enough. Imagine searching for local services. For the end user the most important thing is the accuracy of results, but by accuracy we not only mean the proper results of full text search, but also the results being as near as they can in terms of location. In some cases this is the same as text search on geographical names such as cities or streets, but in other cases we can find it very useful to be able to search on the basis of geographical coordinates of our indexed documents. As you can guess, this is of course also something that is supported by ElasticSearch.

Mapping preparation for spatial search

In order to discuss the spatial search functionality, let's prepare an index with a list of cities. This will be a very simple index with one type named poi (which stands for point of interest) with name of the city and its coordinates. The mappings are as follows:

{
 "mappings" : {
  "poi" : {
   "properties" : {
    "name" : { "type" : "string" },
    "location" : { "type" : "geo_point" }
   }
  }
 }
}

Assuming that we put this definition into the mapping.json file, we can create an index by running the following command:

curl -XPUT localhost:9200/map -d @mapping.json

The only new thing is the geo_point type, which is used for the location field. By using it we can store the geographical position of our city.

Example data

Our example file with documents looks like the following:

{ "index" : { "_index" : "map", "_type" : "poi", "_id" : 1 }}
{ "name" : "New York", "location" : "40.664167, -73.938611" }
{ "index" : { "_index" : "map", "_type" : "poi", "_id" : 2 }}
{ "name" : "London", "location" : [-0.1275, 51.507222] }
{ "index" : { "_index" : "map", "_type" : "poi", "_id" : 3 }}
{ "name" : "Moscow", "location" : { "lat" : 55.75, "lon" : 37.616667 }}
{ "index" : { "_index" : "map", "_type" : "poi", "_id" : 4 }}
{ "name" : "Sydney", "location" : "-33.859972, 151.211111" }
{ "index" : { "_index" : "map", "_type" : "poi", "_id" : 5 }}
{ "name" : "Lisbon", "location" : "eycs0p8ukc7v" }

In order to perform a bulk request, we've added information about index name, type, and the unique identifier of our documents. So we can now easily import this data using the following command:

curl -XPOST http://localhost:9200/_bulk --data-binary @documents.json

Look at this data and the location field. We use various notations for coordinates. We can provide the latitude and longitude as a string, as a pair of numbers, or as an object. Note that the string and array method have a different order for the latitude and longitude parameters. The last record shows that there is also a possibility to give coordinates as a geohash value.

Sample queries

Now let's look at several examples of how to use coordinates and solve common requirement problems in modern applications that require searching geographical data along with full text searching.

Let's start from a very common requirement of sorting results by distance from the given point. In our example, we want to get all the cities and sort them by the distance from the capital of France, that is, Paris. In order to do that, we send the following query to ElasticSearch:

{
 "query" : {
  "matchAll" : {}
 },
 "sort" : [{
  "_geo_distance" : {
   "location" : "48.8567, 2.3508",
   "unit" : "km"
  }
 }]
}

If you remember the Sorting data section from Chapter 2, Searching Your Data, you'll notice that the format is slightly different. We are using the _geo_distance key to indicate sorting by distance. We must give the base location (the location attribute, which holds the information of the Paris location in our case) and we specify units that could be used in results. The available values are km and mi, which stand for kilometers and miles. The result of such a query will be as follows:

{
  "took" : 102,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : null,
    "hits" : [ {
      "_index" : "map",
      "_type" : "poi",
      "_id" : "2",
      "_score" : null, "_source" : { "name" : "London", "location" : [-0.1275, 51.507222] },
      "sort" : [ 343.46748684411773 ]
    }, {
      "_index" : "map",
      "_type" : "poi",
      "_id" : "5",
      "_score" : null, "_source" : { "name" : "Lisbon", "location" : "eycs0p8ukc7v" },
      "sort" : [ 1453.6450747751787 ]
    }, {
      "_index" : "map",
      "_type" : "poi",
      "_id" : "3",
      "_score" : null, "_source" : { "name" : "Moscow", "location" : { "lat" : 55.75, "lon" : 37.616667 }},
      "sort" : [ 2486.2560754763977 ]
    }, {
      "_index" : "map",
      "_type" : "poi",
      "_id" : "1",
      "_score" : null, "_source" : { "name" : "New York", "location" : "40.664167, -73.938611" },
      "sort" : [ 5835.763890418129 ]
    }, {
      "_index" : "map",
      "_type" : "poi",
      "_id" : "4",
      "_score" : null, "_source" : { "name" : "Sydney", "location" : "-33.859972, 151.211111" },
      "sort" : [ 16960.04911335322 ]
    } ]
  }

As for the other examples with sorting, ElasticSearch shows information about the values used for sorting. Let's look at the highlighted record. As we can see, the distance between Paris and London is about 343 km; you can check that the map agrees with ElasticSearch in this case.

Bounding box filtering

The next example that we want to show is narrowing the results to a selected area that is bounded by a given rectangle. This is very handy if we want to show results on the map or when we allow the user to mark a map area for searching. You have already read about filters in the Filtering your results section in Chapter 2, Searching Your Data, so you can probably guess that we need to use this functionality. This is how we can do it:

{
 "filter" : {
  "geo_bounding_box" : {
   "location" : {
    "top_left" : "52.4796, -1.903",
    "bottom_right" : "48.8567, 2.3508"
   }
  }
 }
}

In this example, we selected the map fragment between Birmingham and Paris by providing the top-left and bottom-right corners' coordinates. Those two corners are enough to specify any rectangle we want and ElasticSearch will do the rest of the calculation for us. The following screenshot shows the specified rectangle on the map:

Bounding box filtering

As we can see, the only city from our data that meets the criteria is London. So let's check if ElasticSearch knows about that by running the previous query and checking the results:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "map",
      "_type" : "poi",
      "_id" : "2",
      "_score" : 1.0, "_source" : { "name" : "London", "location" : [-0.1275, 51.507222] }
    } ]
  }
}

As you can see, once again ElasticSearch agrees with the map.

Limiting the distance

The previous example shows the next common requirement, that is, how to limit the result to places that are located within the selected distance from the base point. Let's see all the cities closer than 500km from Paris:

{
  "filter" : {
   "geo_distance" : {
    "location" : "48.8567, 2.3508",
    "distance" : "500km"
   }
  }
}

If everything goes well, ElasticSearch should return only a single record for this query and that record should be that for London; however, we will leave it for you as a reader to check that.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset