Chapter 7. Different Methods of Search and Bulk Operations

The use cases of different searches differ according to scenarios, and Elasticsearch provides a lot of flexibility regarding how a user can perform search requests and return the data for efficient processing. The other most important thing to know is the execution of bulk operations, which enables you to finish your tasks quickly and do some other important work in your life.

In this chapter, we will cover the following topics:

  • Introducing search types in Elasticsearch
  • Cheaper CRUD bulk operations
  • Multi get and multi search APIs
  • Data pagination and re-indexing
  • Practical considerations for bulk processing

Introducing search types in Elasticsearch

Elasticsearch provides the following search types to be executed:

  • query_then_fetch: This is the default search type available in Elasticsearch. It follows a two-phase search execution. In the first phase (query), the query goes to a coordinating node that further forwards the query to all the relevant shards. Each shard searches the documents, sorts them locally, and returns the results to the coordinating node. The coordinating node further merges all the results, sorts them, and returns the result to the caller. The final results are of the maximum size specified in the size parameter with the search request.
  • dfs_query_then_fetch: This is similar to the query_then_fetch search type, but asks Elasticsearch to do some extra processing for more accurate scoring of documents. In the fetch phase, all the shards compute the distributed term frequencies.
  • scan: The scan search type differs from normal search requests because it does not involve any scoring and sorting processing of the documents. scan is used for the scenarios where scoring is not required and you need to iterate over a large number of documents from Elasticsearch.

    Note

    The deprecated search type: count

    There used to be another search type, count, that was used to return just the count of documents for a given query. It was also used while doing aggregation for excluding documents in a result and only returning the aggregation results. Count has been deprecated from Elasticsearch version 2.0 and will be removed in upcoming releases. You just need to use the size parameter of 0 in your query instead of using the count search type.

Search types can be specified while executing your search with the search_type parameter in the following way:

  • Using REST endpoint:
    GET /search/search_type=scan
    
  • Using Python client:
    es.search(index=index_name, doc_type=doc_type, body=query, search_type='scan'
    
  • Using Java client, first import SearchType using the following import statement:
    import org.elasticsearch.action.search.SearchType;
    
  • Then, do the following:
    client.prepareSearch("index_name")
    .setTypes("doc_type")
    .setSearchType(SearchType.SCAN)
    .setQuery(QueryBuilders.matchAllQuery())
    .execute().actionGet();
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset