Chapter 7. Administrating Your Cluster

In the previous chapter, we've mainly looked at how to use the faceting functionality in ElasticSearch, which allowed us to get aggregated statistics about our search results. In addition to that, we've learned how to use the "more like this" REST endpoint to get a similar document to the ones we've found, and in addition to that we've also used the prospective search functionality called the percolator to store queries and check which queries matched the document sent to ElasticSearch. In this chapter, we will take a look at cluster health and cluster state monitoring. We will learn how to use tools to diagnose the state of our cluster. We will also use the shard and replica allocation mechanism to control the nodes on which they are placed by ElasticSearch. Finally we will learn what the gateway and discovery modules are and how to configure them. By the end of this chapter you will have learned:

  • How to monitor your cluster state and health
  • How to use tools for cluster state diagnosis
  • How to control shard and replica allocation
  • How to use the gateway module
  • How to use the discovery module
  • How to install plugins

Monitoring your cluster state and health

During the normal life of an application, a very important concern is monitoring. This allows the administrators of the system to detect possible problems and prevent them before they occur or at least know what happens during a failure.

ElasticSearch provides very detailed information that allows us to check and monitor the node or cluster as a whole. This includes statistics, information about the server, and node parameters, but first of all it includes complete information about the current cluster state. Let's look at this in more detail. But before that let's take a look at one piece of information: these APIs are very complex and in this book we've only described the basics. Please note that the amount of information regarding the cluster state and health monitoring is enormous; because of this we keep the information and details about ElasticSearch internal to the minimum that is needed to understand the described topic.

The cluster health API

ElasticSearch exposes information about the current node or cluster in the cluster health API. Let's see the reply for the following command:

curl localhost:9200/_cluster/health?pretty

In my notebook the answer is as follows:

{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 103,
  "active_shards" : 103,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 101
}

The most important piece of information is the one about the status of the cluster. In our example we see that the cluster is in the yellow state. What does that mean? Let's stop here and talk about a cluster and when a cluster, as a whole, is fully operational. As you already know, ElasticSearch always assumes that the current node is a part of a cluster. This means that the index is divided into separate parts called shards and can be allocated on a few nodes. In addition to that, ElasticSearch can create copies of these shards (replicas) to handle more requests and for data consistency.

A cluster is fully operational when ElasticSearch is able to allocate all shards and replicas on machines according to its configuration. This is the green state. The yellow state means that we are ready for handling requests because the primary shards are allocated, but some (or all) replicas aren't. The last state, red, means that the ElasticSearch cluster is not ready yet and at least one of the primary shards is not ready. When we have only one node and you have replicas, the yellow state is obvious. There are no other nodes to place replicas on. Let's start another node and check again:

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 103,
  "active_shards" : 205,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

Now the cluster's state is green and our cluster is fully operational. This query can also be executed on a specific index or indices, for example:

curl 'localhost:9200/_cluster/health/library,map?pretty'

The state can be determined on several levels: shards, index (determined by the worst shard status), and cluster (determined by the worst index status). These levels can be used as the parameter in this API, affecting the level of detail of the information returned. Compare the results of the following commands:

curl 'localhost:9200/_cluster/health?pretty'
curl 'localhost:9200/_cluster/health?pretty&level=indices'
curl 'localhost:9200/_cluster/health?pretty&level=shards'

As we said, the "color of the cluster" has vital meaning for an application and direct connection with the availability of this application. As a result of this, during the bootstrap of the system, as a part of starting scripts, it is convenient to use this API to check whether the system is ready. ElasticSearch introduces additional parameters for that; one fo them is wait_for_status with a value corresponding to the color. The other interesting parameter is wait_for_nodes with a required number of nodes available. Both of these parameters cause that request not to end until the cluster attains the desired state/number of nodes, or the timeout exception is thrown. This timeout value can be changed using the timeout parameter (the default value is 30 seconds). For example:

curl 'localhost:9200/_cluster/health?wait_for_status=green&wait_for_nodes=>=3&timeout=100s'

The previous command result will be returned only when the cluster has a green status and when there are at least three nodes available. The command will be canceled after 100 seconds if the mentioned conditions are not met. In this case, information about the timeout will be available in the returned JSON response.

The indices stats API

ElasticSearch can show various statistics concerning indices. All this information is available using the/_stats API endpoint. Queries sent to this endpoint can get information about all the indices (/_stats), one particular index (for example, /library/_stats), or several indices (for example, /library,map/_stats). If you've tried the examples shown previously in this book, you can check the status by using the following command:

curl localhost:9200/library,map/_stats?pretty

The response probably has almost 300 lines, so we only describe its structure. in addition to information about status and response time, we can see three objects named primaries, total, and indices. The indices object contains information about library and map indices. The primaries object contains information about all primary shards allocated on the current node, and the total object contains information about all the shards including replicas. All these objects have the same structure and contain objects such as docs, store, indexing, get, and search. Let's discuss the information stored in those objects.

Docs

These statistic shows information about indexed documents. For example:

"docs" : {
 "count" : 4,
 "deleted" : 0
}

The main information is the count value, indicating the number of documents in the described index. When we delete documents from the index, ElasticSearch doesn't remove these documents immediately and only marks them as deleted. Documents are physically deleted in the segment merge process. The number of documents marked as deleted is presented as the deleted attribute and should be 0 right after the merge.

Note

If you are not familiar with the Apache Lucene library, then you may not know what segment merge is. Lucene divides the index into parts called segments, which once written can't be changed. After some time the number of segments grows, and when Lucene decides that the index is built of too many segments, it starts the process of segment merging. Lucene creates a new, larger segment with the information from the smaller ones and deletes the small indices.

Store

The next statistics, as you can guess, are connected with storage. The following is an example:

"store" : {
 "size" : "7.6kb",
 "size_in_bytes" : 7867,
 "throttle_time" : "0s",
 "throttle_time_in_millis" : 0
}

The main information is about the size of the index (or indices). We can also look at throttling statistics. This information is useful when the system has problems with I/O performance and has configured limits on the internal operation during segment merge.

Indexing, get, and search

The next three statistics are information about data manipulation: indexing with delete operations, using real-time get, and searching. Let's look at the following example:

"indexing" : {
 "index_total" : 11501,
 "index_time" : "4.5s",
 "index_time_in_millis" : 4574,
 "index_current" : 0,
 "delete_total" : 0,
 "delete_time" : "0s",
 "delete_time_in_millis" : 0,
 "delete_current" : 0
},
"get" : {
 "total" : 3,
 "time" : "0s",
 "time_in_millis" : 0,
 "exists_total" : 2,
 "exists_time" : "0s",
 "exists_time_in_millis" : 0,
 "missing_total" : 1,
 "missing_time" : "0s",
 "missing_time_in_millis" : 0,
 "current" : 0
},
"search" : {
 "query_total" : 0,
 "query_time" : "0s",
 "query_time_in_millis" : 0,
 "query_current" : 0,
 "fetch_total" : 0,
 "fetch_time" : "0s",
 "fetch_time_in_millis" : 0,
 "fetch_current" : 0
}

As we can see, all of these statistics have a similar structure. We can read the total time spent in various request types (in human-readable form and in milliseconds) and the number of requests (which, with total time, allows us to calculate the average time of one query). In the case of get requests, valuable information is the number of fetches that were unsuccessful (missing documents).

The mentioned docs, store, indexing, get, and search are returned by default. The indices stats API can also provide additional information about the merge process, flush, and refresh. You can add this information to the reply using appropriate parameters. For example:

curl 'localhost:9200/_stats?merge&flush&refresh&pretty'

The status API

There is a second way of obtaining information about indices: the/_status endpoint. The available information describes the available shards (and information about which of them is currently considered primary), information about the transaction log and merge process. Adding additional parameters such as recovery and snapshot adds additional information about the shard recovery status and the snapshot status. You can review this information but most of it is connected with the usage of the Lucene library or is very low-level and is beyond the scope of this book.

The nodes info API

The next source of information about the cluster is the nodes info API available at the /_cluster/nodes or the/_nodes REST endpoints. This API can be used to fetch information about particular nodes or a node using the following:

  • The node name (for example, /_nodes/Pulse)
  • The identifier (for example, /_nodes/ny4hftjNQtuKMyEvpUdQWg)
  • The address (for example, /_nodes/192.168.1.103)
  • The parameters from the ElasticSearch configuration (for example, /_nodes/rack:2)

This API also allows us to get information about several nodes at once by:

  • Using patterns (for example, /_nodes/192.168.1.* or /_nodes/P*)
  • Using enumerations (for example, /_nodes/Pulse,Slab)
  • Using both patterns and enumerations (for example, /_nodes/P*,S*)

By default, this query returns the basic information about a node such as the name, identifier, and address. But by adding additional Boolean parameters, we can obtain many other items of information. The available parameters are as follows:

  • settings: To get the ElasticSearch configuration
  • os: To get information about the server, such as processor, RAM, and swap space
  • process: To get the process identifier and the available file descriptors
  • jvm: To get information about the Java virtual machine, such as the memory limits
  • thread_pool: To get the configuration of thread pools for various operations
  • network: To get the network interface name and addresses
  • transport: To get listen addresses for transport
  • http: To get listen addresses for HTTP
  • all: To get all the previously mentioned information

A sample usage of the previously mentioned API can be seen by using the following command:

curl 'localhost:9200/_nodes/Pulse?os&jvm&pretty'

This curl invocation returns information about the machine and the Java virtual machine on a node named Pulse.

The nodes stats API

This API is similar to the nodes info API previously described. The main difference is that the previous API provides information about the environment, and the one we are talking about now tells us about what happens with the cluster during its work. The nodes stats API is available under _cluster/nodes/stats and /_nodes/stats. Similar to the nodes info API, we can obtain information from selected nodes (for example, /_nodes/Pulse/stats). The available flags for returned statistics are as follows:

  • indices: To get information similar to the information from the indices stats API and information about cache usage
  • os: To get information about server uptime, load, memory, and swap usage
  • process: To get information about the memory and CPU used by the process
  • jvm: To get information about the memory and garbage collector statistics for a Java virtual machine
  • network: To get TCP-level information
  • transport: To get information about the data sent and received by the transport module
  • http: To get information about the HTTP connections
  • fs: To get information about the available disk space and I/O operations statistics
  • thread_pool: To get information about the state of the threads assigned to various operations
  • all: To get all the above information

An example usage can look like the following command:

curl 'localhost:9200/_nodes/Pulse/stats?os&jvm&pretty'

The cluster state API

The /_cluster/state endpoint provides basic information about nodes, state, settings, aliases, and the mappings of the indices. In addition to that information, you can find information about shard assignment. There is the possibility to filter out unnecessary information using the following parameters: filter_nodes, filter_routing_table, filter_metadata, filter_blocks, and filter_indices. In the last filter, you can set a comma-separated list of indices that should be included in the response. The example usage is as follows:

curl 'localhost:9200/_cluster/state?filter_indices=library&pretty'
curl 'localhost:9200/_cluster/state?filter_nodes&pretty'

The indices segments API

The last API is available by using the /_segments endpoint. There is also the possibility to address only one or several indices (for example, by using the following REST endpoint: /library,map/_segments). This API not only provides information about shards and their placing but also information about segments connected with a physical index managed by the Lucene library.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset