The Elasticsearch Admin API is quite extensive and covers almost every part of its architecture—from low-level information about Lucene to high-level information about the cluster nodes and their health. All this information is available both using the Elasticsearch Java API as well as using the REST API; however, the data is returned in the JSON format. What's more—the returned data can sometimes be hard to analyze without further parsing. For example, try to run the following request on your Elasticsearch cluster:
curl -XGET 'localhost:9200/_stats?pretty'
On our local, single node cluster, Elasticsearch returns the following information (we cut it down drastically; the full response can be found in the stats.json
file provided with the book):
{ "_shards" : { "total" : 60, "successful" : 30, "failed" : 0 }, "_all" : { "primaries" : { . . . }, "total" : { . . . } }, "indices" : { . . . } }
If you look at the provided stats.json
file, you would see that the response is about 1,350 lines long. This isn't quite convenient for analysis by a human without additional parsing. Because of this, Elasticsearch provides us with a more human-friendly API—the Cat API. The special Cat API returns data in a simple text, tabular format, and what's more, it provides aggregated data that is usually usable without any further processing.
The base endpoint for the Cat API is quite obvious—it is /_cat
. Without any parameters, it shows us all the available endpoints for that API. We can check this by running the following command:
curl -XGET 'localhost:9200/_cat'
The response returned by Elasticsearch should be similar or identical (depending on your Elasticsearch version) to the following one:
=^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields}
So, looking for the top Elasticsearch allows us to get the following information using the Cat API:
Let's start using the Cat API through an example. We can start with checking the cluster health of our Elasticsearch cluster. To do this, we just run the following command:
curl -XGET 'localhost:9200/_cat/health'
The response returned by Elasticsearch to the preceding command should be similar to the following one:
1414347090 19:11:30 elasticsearch yellow 1 1 47 47 0 0 47
It is clean and nice. Because it is in a tabular format, it is also easier to use the response in tools such as grep
, awk
, or sed
—a standard set of tools for every administrator. It is also more readable once you know what it is all about. To add a header describing each column purpose, we just need to add an additional v
parameter just like this:
curl -XGET 'localhost:9200/_cat/health?v'
The response is very similar to what we've seen previously, but it now contains a header describing each column:
epoch timestamp cluster status node.total node.data shards pri relo init unassign 1414347107 19:11:47 elasticsearch yellow 1 1 47 47 0 0 47
Every Cat API endpoint has its own arguments, but there are a few common options that are shared among all of them:
v
: This adds a header line to response with names of presented items.h
: This allows us to show only chosen columns (refer to the next section).help
: This lists all possible columns that this particular endpoint is able to show. The command shows the name of the parameter, its abbreviation, and the description.bytes
: This is the format for information representing values in bytes. As we said, the Cat API is designed to be used by humans and, because of that, these values are represented in a human-readable form by default, for example, 3.5kB
or 40GB
. The bytes
option allows us to set the same base for all numbers, so sorting or numerical comparison will be easier. For example, bytes=b
presents all values in bytes, bytes=k
in kilobytes, and so on.For the full list of arguments for each Cat API endpoint, refer to the official Elasticsearch documentation available at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat.html.
When we wrote this book, the Cat API had 21 endpoints. We don't want to describe them all—it would be a repetition of information contained in the documentation or chapters about the administration API. However, we didn't want to leave this section without any example regarding the usage of the Cat API. Because of this, we decided to show you how easily you can get information using the Cat API compared to the standard JSON API exposed by Elasticsearch.
The first example shows you how easy it is to get information about which node in our cluster is the master node. By calling the /_cat/master
REST endpoint, we can get information about the nodes and which one of them is currently being elected as a master. For example, let's run the following command:
curl -XGET 'localhost:9200/_cat/master?v'
The response returned by Elasticsearch for my local two nodes cluster looks as follows:
id host ip node 8gfdQlV-SxKB0uUxkjbxSg Banshee.local 10.0.1.3 Siege
As you can see in the response, we've got the information about which node is currently elected as the master—we can see its identifier, IP address, and name.
The /_cat/nodes
REST endpoint provides information about all the nodes in the cluster. Let's see what Elasticsearch will return after running the following command:
curl -XGET 'localhost:9200/_cat/nodes?v&h=name,node.role,load,uptime'
In the preceding example, we have used the possibility of choosing what information we want to get from the approximately 70 options for this endpoint. We have chosen to get only the node name, its role—whether a node is a data or client node— node load, and its uptime.
The response returned by Elasticsearch looks as follows:
name node.role load uptime Alicia Masters d 6.09 6.7m Siege d 6.09 1h
As you can see the /_cat/nodes
REST endpoint provides all requested information about the nodes in the cluster.