Working with geo-point data

Geo-points are single location points defined by a latitude-longitude pair on the surface of the earth. Using geo-points you can do the following things:

  • Calculate the distance between two points
  • Find the document that falls in a specified rectangular area
  • Sort documents based on distance and score results based on it
  • Create clusters of geo-points using aggregations

Mapping geo-point fields

Unlike all the data types in Elasticsearch, geo-point fields can't be determined dynamically. So, you have to define the mapping in advance before indexing data. The mapping for a geo-point field can be defined in the following format:

"location": {
    "type": "geo_point"
}

A geo_point mapping indexes a single field (the location in our example) in the lat-lon format. You can optionally index .lat and .lon separately by setting the lat-lon parameter to true.

Indexing geo-point data

Elasticsearch supports the following three formats to index geo_point data with the same mapping that we defined in the previous section:

lat-lon as a string :  "location" : "28.61, 77.23"
lat-lon as an object : "location": {
                    "lat": 28.61,
                    "lon": 77.23
                  }
lat-lon as an array : "location" : [77.23, 28.61]

The order of latitude-longitude differs in an array format. It takes longitude first and then latitude.

Python example

In this section, we will see how to index the geo_point data in all the three formats using Python:

  • Using string format:
    doc ={"location": "28.61, 77.23"}
    es.index(index=index_name, doc_type=doc_type, body=doc)
  • Using object format:
    location = dict()
    location['lat'] = 28.61
    location['lon'] = 77.23
    doc['location'] = location
    es.index(index=index_name, doc_type=doc_type, body=doc)
  • Using array format:
    location = list()
    location.append(77.23)
    location.append(28.6)
    doc['location'] = location
    es.index(index=index_name, doc_type=doc_type, body=doc)

Java example

  • Using string format:
    Map<String, Object> document1= new HashMap<String, Object>();
        document1.put("location", "29.9560, 78.1700");
        document1.put("name", "delhi");
        document1.put("dish_name", "chinese");
    client.prepareIndex().setIndex(indexName).setType(docType)
            .setSource(document1).execute().actionGet();
  • Using object format:
    Map<String, Object> document3 = new HashMap<String, Object>();
    Map<String, Object> locationMap = new HashMap<String, Object>();
        locationMap.put("lat", 29.9560);
        locationMap.put("lon", 78.1700);
        document3.put("location", locationMap);
        document3.put("name", "delhi");
        document3.put("dish_name", "chinese");
    client.prepareIndex().setIndex(indexName).setType(docType)
            .setSource(document3).execute().actionGet();
  • Using array format:
    Map<String, Object> document2= new HashMap<String, Object>();
    List<Double> geoPoints = new ArrayList<Double>();
        geoPoints.add(77.42);
        geoPoints.add(28.67);
      document2.put("location", geoPoints);
      document2.put("name", "delhi");
      document2.put("dish_name", "chinese");
    client.prepareIndex().setIndex(indexName).setType(docType)
            .setSource(document2).execute().actionGet();

Querying geo-point data

The following are the query types available to query data with the geo_point field type:

  • Geo distance query
  • Geo distance range query
  • Geo bounding box query

Geo distance query

The geo distance query is used to filter documents that exist within a specified distance from a given field. Let's see an example of how can we find out the best places to visit within a 200 km distance from Delhi.

Python example

query = {
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_distance": {
          "distance": "200km",
          "location": {
            "lat": 28.67,
            "lon": 77.42
          }
        }
      }
    }
  }
}
response = es.search(index=index_name, doc_type=doc_type, body=query)

In the preceding query, we have used locations lat-lon in the object form; however, you always have an option to use string or array formats in the query without worrying about the format in which your data has been indexed.

The distance can be specified in various time-unit formats, such as the following:

  • mi or miles for mile
  • yd or yards for yard
  • ft or feet for feet
  • in or inch for inch
  • km or kilometers for kilometer
  • m or meters for meter
  • cm or centimeters for centimeter
  • mm or millimeters for millimeter
  • NM, nmi or nauticalmiles for nautical mile

Java example

Apart from importing QueryBuilders, you need to have the following import in you code:

import org.elasticsearch.common.unit.DistanceUnit;

DistanceUnit is an Enum type that provides all the distance units that can be used.

Build the search query as follows:

QueryBuilder query = QueryBuilders.matchAllQuery();

Now, the geo distance query can be built like this:

QueryBuilder geoDistanceQuery =
        QueryBuilders.geoDistanceQuery("location")
        .lat(28.67).lon(77.42)
        .distance(12, DistanceUnit.KILOMETERS);

Combine both the queries to make a final query. Note that our geo distance query is part of a boolQuery that comes under the must block:

QueryBuilder finalQuery = QueryBuilders.boolQuery()
        .must(query).filter(geoDistanceQuery);

Here is the final execution:

SearchResponse response =
        client.prepareSearch(indexName).setTypes(docType)
        .setQuery(finalQuery)
        .execute().actionGet();

Geo distance range query

In Chapter 3, Putting Elasticsearch into Action we saw range and date range queries. The geo distance range query has the same concept. It is used to filter out documents that fall in a specified range with respect to a given point of location. For example, with the following query, you can find out the documents that fall in the range of 2,000 to 400 km from Delhi:

{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_distance_range": {
          "from": "200km",
          "to": "400km",
          "location": [77.42,28.67]
        }
      }
    }
  }
}

All the distance units that we have seen for the geo_distance query can be applied to this query too. This query also supports the common parameters for a range (lt, lte, gt, gte, from, to, include_upper, and include_lower).

Java example

The following example is an implementation of the same JSON query that we have seen for Python:

QueryBuilder query = QueryBuilders.matchAllQuery();
QueryBuilder geoDistanceRangeQuery =
        QueryBuilders.geoDistanceRangeQuery("location")
        .lon(28.67).lat(77.42)
        .from("100km").to("4000km");
QueryBuilder finalQuery = QueryBuilders.boolQuery()
        .must(query).filter(geoDistanceRangeQuery);

SearchResponse response =
        client.prepareSearch(indexName).setTypes(docType)
        .setQuery(finalQuery).execute().actionGet();

Geo bounding box query

This query works based on the points of a rectangle also called as bounding box. You provide the top, bottom, left, and right coordinates of the rectangle and the query compares the latitude with the left and right coordinates and the longitude with the top and bottom coordinates:

{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_bounding_box": {
          "location": {
            "top_left": {
              "lat":76.9771,
              "lon": 28.7965
            },
            "bottom_right": {
              "lat": 28.4301,
              "lon": 77.5717
            }
          }
        }
      }
    }
  }
}

See the special parameters, top_left and bottom_right, that are points of a rectangle.

These keys can also be used in an array format:

"top_left" : [28.7965,76.9771],
"bottom_right" : [77.5717, 28.4301]

They can be used in a string format as well:

"top_left" : "76.9771, 28.7965",
"bottom_right" : "28.4301, 77.5717"

Understanding bounding boxes

Initially it could be a little hard to understand and create the bounding boxes but this section will guide you in understanding and creating bounding boxes to enable you to use them in queries.

Please visit http://www.openstreetmap.org/ and on the top–left corner, click the Export button.

Now you can either search for a place or can manually select an area (Delhi and related areas in our example) using the corners, as shown in the following image:

Understanding bounding boxes

In the preceding image, you can see four points that depict the corners of the rectangle that we have drawn. The top_left point in the preceding image is 76.9771, 28.7965, whereas the bottom_right point is 28.4301, 77.5717.

Java example

You need to import the following additional classes in your code first:

import org.elasticsearch.common.geo.GeoPoint;

Note that Geopoint is a class in Elasticsearch that is used to create geo-points. If you do not choose to use it, you always have the lat() and lon() methods available to set the latitude and longitude points in the queries, as we have seen in the previous examples. However, for your knowledge, this example uses the GeoPoint class:

GeoPoint topLeft= new GeoPoint(68.91,35.60);
GeoPoint bottomRight= new GeoPoint(7.80,97.29);

QueryBuilder query = QueryBuilders.matchAllQuery();
QueryBuilder geoDistanceRangeQuery =
        QueryBuilders.geoBoundingBoxQuery("location")
        .topLeft(topLeft).bottomRight(bottomRight);
QueryBuilder finalQuery = QueryBuilders.boolQuery()
        .must(query).filter(geoDistanceRangeQuery);
SearchResponse response =
        client.prepareSearch(indexName).setTypes(docType)
        .setQuery(finalQuery)
        .execute().actionGet();

Sorting by distance

In the previous chapters, we saw how default sorting works on _score calculated by Elasticsearch, and we also saw how we can use the values of a field to influence the sorting of documents. Elasticsearch allows the sorting of documents by distance using the _geo_distance parameter.

For example, you want to find all the restaurants in a sorted order with respect to your current location and those that have Chinese cuisine in a list of restaurants available in your index.

Python example

query =  {
      "query": {
        "term": {
          "dish_name": {
            "value": "chinese"
          }
        }
      },
      "sort": [
        {
          "_geo_distance": {
            "location": [
              28.67,
              77
            ],
            "order": "asc",
            "unit": "km"
          }
        }
      ]
    }
response = es.search(index=index_name, doc_type=doc_type, body=query)

Java example

The same preceding query can be written in Java in the following way; however, first you need to import some extra classes:

import org.elasticsearch.search.sort.SortBuilder;
import org.elasticsearch.search.sort.SortBuilders;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.common.unit.DistanceUnit;

We have already covered the explanation of DistanceUnit. SortOrder is also an Enum that provides different values such as ASC and DESC that can be used for sorting purposes.

Our other import, SortBuilder, is not only used for gro sorting, but can be also used to do sorting on other types of fields:

QueryBuilder query = QueryBuilders.termQuery("dish_name", "chinese");
SortBuilder sortingQuery =   SortBuilders.geoDistanceSort("location")
        .point(28.67, 77).unit(DistanceUnit.KILOMETERS)
        .order(SortOrder.ASC);
SearchResponse response =
        client.prepareSearch(indexName).setTypes(docType)
        .setQuery(query)
        .addSort(sortingQuery)
        .execute().actionGet();

Note

Please note that sorting by distance is a memory- and CPU-intensive task, so if you have a lot of documents in your index, it's better to use filters such as bounding box or queries to minimize the search context.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset