Chapter 2. Searching Your Data

In the previous chapter we installed and configured our cluster. We also prepared our mappings and indexed our data. We can now do the thing you first had in mind when you chose ElasticSearch, searching! In this chapter you will learn how to query ElasticSearch. Of course, you could say, "Hey, I can just run curl –XGET 'http://localhost:9200/_search?q=first+query' and get all the data I am interested in", and you would be right. However, ElasticSearch supports a wide variety of queries both simple and complicated. In this chapter we will start to get used to some of the search capabilities that ElasticSearch exposes. By the end of this chapter you will have learned:

  • How to query ElasticSearch using its Query DSL
  • How to use basic queries
  • How to use compound queries
  • How to filter your results and why it is important
  • How to change the sorting of your results
  • How to use scripts in ElasticSearch

Understanding the querying and indexing process

Before we see how to search for data, it would be good to understand how the documents and queries sent to ElasticSearch are processed. If you already know that, you can skip this part of the chapter.

In order to understand the querying and indexing process, you should understand the following concepts:

  • Indexing: This is the process of preparing the document sent to ElasticSearch and storing it in the index.
  • Searching: This is the process of matching the documents that satisfy the query requirements.
  • Analysis: This is the process of preparing the content of a field and converting the content to terms that can be written into the Lucene index. During indexing, the data in the fields is divided into a stream of tokens (words) that are written into the index as terms (tokens with additional information such as position in the input text). The analysis process can consist of the following steps:
    • Tokenization: During this stage, the input text is turned into a token stream by the tokenizer.
    • Filtering: During this stage, zero or more filters can process tokens in the token stream. For example, the stopwords filter can remove irrelevant tokens from the stream, the synonyms filter can add new tokens or change existing ones, and the lowercase filter will make all tokens lowercase.
  • Analyzer: This is a single tokenizer with zero or more filters. We can specify analyzers when working with fields, types, and queries.

It is worth mentioning that the analysis process we've just discussed is used during searching and indexing and both index-time analysis and query-time analysis can be configured differently. However, it is very important that the terms produced during index and query time match because, if they don't, you'll want to find your documents manually. For example, if you use stemming during indexing and you don't use stemming while searching, you'll have to pass the stemmed words in order to find your documents.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset