Until now, you have seen the execution of a single get
request to fetch a document and hit a single query at a time to search for documents. However, life will be easier with the following two APIs offered by Elasticsearch.
Multi get
is all about combining more than one get
request in a single request. I remember once I had a requirement to check the existence of multiple documents in an index and create a bulk update request against only those IDs that did not already exist. The one way to do this was by hitting a single HEAD request for each document ID, and based on the response of Elasticsearch, create a bulk update request for the documents that did not exist. However, multi get requests can solve this problem in a single API call instead of multiple HEAD requests.
All you need to do is create an array of document IDs and hit them on Elasticsearch using the _mget
endpoint of Elasticsearch. The following is a simple curl
request to show how you can do this:
curl 'localhost:9200/index_name/doc_type/_mget' -d '{ "ids" : ["1", "2"] }'
Here, IDs are the _id
of the documents to be fetched.
You have additional options to decide whether you want to return the data of the document or not. If it is not required, just set _source : false
while hitting the mget
request. For example:
curl 'localhost:9200/index_name/doc_type/_mget' -d '{ "ids" : ["1", "2"], "_source" : false }'
If you are interested in only returning a particular field, you can do it like this:
curl 'localhost:9200/index_name/doc_type/_mget' -d '{ "ids" : ["1", "2"], "_source" : ["field1", "field2"] }
Here, field1
and field2
are the names of the fields required to be returned.
Python example:
Declare an array of IDs to be fetched:
document_ids_to_get = ['1','4','12','54','123','543']
Create a query by passing an array of doc
IDs to the ID parameter:
query = {"ids": document_ids_to_get}
#Exceute the query using mget endpoint:
exists_resp = es.mget(index=index_name,doc_type=doc_type, body=query, _source=False, request_timeout=100)
Java example:
Import the following packages into your source code:
import org.elasticsearch.action.get.MultiGetItemResponse; import org.elasticsearch.action.get.MultiGetResponse;
Create a multi get request in the following way:
MultiGetResponse responses = client.prepareMultiGet()
.add(indexName, docType, ids_to_be_fetched)
.execute().actionGet();
The multi get response is parsed in the following way:
for (MultiGetItemResponse itemResponse : responses) { GetResponse response = itemResponse.getResponse(); if (response.isExists()) { String json = response.getSourceAsString(); System.out.println(json); } }
The id_to_be_fetched
function is a list of document IDs that need to be fetched.
You might have worked with many databases and search engines, but none of them provides the functionality to hit more than one query in a single request. Elasticsearch can do this with its _msearch REST
API. For this, it follows a specific request format as shown here:
header body ……… ……… header body
Understanding the preceding search request structure:
header
: This includes the name of the index/indices to be searched upon and optionally includes the search type, search preference nodes (primary, secondary, and so on), and routingbody
: This includes the search request queriesLet's see an example:
multi_requests
, with the following content. Please note that each line is separated with
(new line):{"index" : "users"} {"query" : {"match_all" : {}}, "from" : 0, "size" : 10} {"index" : "twitter", "search_type" : "dfs_query_then_fetch"} {"query" : {"match_all" : {}}}
_msearch
API:curl -XGET localhost:9200/_msearch --data-binary "@ multi _requests"
In the preceding curl
command, we have used the –data-binary
flag to load the multiline content from the file. This is required while executing bulk data indexing too.
Searches executed with the _msearch
API return responses in the responses array form, which includes the search response for each search request that matches its order in the original multi search
request. If there is a complete failure for that specific search request, an object with an error message will be returned in the place of the actual search response.
Python example:
req_head1 = {'index': index_name1, 'type': doc_type1}
query_request_array
list, which contains the actual queries and the head part of those queries:query_request_array = [] query_1 = {"query" : {"match_all" : {}}} query_request_array.append(req_head1) query_request_array.append(query_1)
req_head2 = {'index': index_name2, 'type': doc_typ2} query_2 = {"query" : {"match_all" : {}}} query_request_array.append(req_head2) query_request_array.append(query_2)
msearch
endpoint by passing query_request_array
into the body; you can optionally set request_timeout
too:response = es.msearch(body=query_request_array)
for resp in response["responses"]: if resp.get("hits"): for hit in resp.get("hits").get('hits'): print hit["_source"]
Java example:
import org.elasticsearch.action.search.MultiSearchResponse; import org.elasticsearch.action.search.SearchRequestBuilder;
SearchRequestBuilder
class:SearchRequestBuilder searchRequest1 = client.prepareSearch().setIndices(indexName).setTypes(docType) .setQuery(QueryBuilders.queryStringQuery("elasticsearch").defaultField("text")).setSize(1); SearchRequestBuilder searchRequest2 = client.prepareSearch().setIndices(indexName).setTypes(docType) .setQuery(QueryBuilders.matchQuery("screen_name", "d_bharvi")).setSize(1);
MultiSearchResponse sr = client.prepareMultiSearch()
.add(searchRequest1)
.add(searchRequest1)
.execute().actionGet();
MultiSearchResponse
, as follows:long nbHits = 0; for (MultiSearchResponse.Item item : sr.getResponses()) { SearchResponse response = item.getResponse(); nbHits += response.getHits().getTotalHits(); } }