The Elasticsearch Admin API is quite extensive and covers almost every part of Elasticsearch architecture: from low-level information about Lucene to high-level ones about the cluster nodes and their health. All this information is available using the Elasticsearch Java API as well as the REST API. However, the returned data, even though it is a JSON document, is not very readable by a user, at least when it comes to the amount of information given.
Because of this, Elasticsearch provides us with a more human-friendly API – the Cat API. The special Cat API returns data in a simple text, tabular format and what's more – it provides aggregated data that is usually usable without any further processing.
The base endpoint for the Cat API is quite obvious: it is /_cat
. Without any parameters, it shows all the available endpoints for this API. We can check this by running the following command:
curl -XGET 'localhost:9200/_cat'
The response returned by Elasticsearch should be similar or identical (depending on your Elasticsearch version) to the following one:
=^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} /_cat/nodeattrs /_cat/repositories /_cat/snapshots/{repository}
So looking from the top Elasticsearch allows us to get the following information using the Cat API:
Using the Cat API is as simple as running the GET
request to the one of the previously mentioned REST end-points. For example, to get information about the cluster state, we could run the following command:
curl -XGET 'localhost:9200/_cat/health'
The response returned by Elasticsearch for the preceding command should be similar to the following one, but, of course, will be dependent on your cluster:
1446292041 12:47:21 elasticsearch yellow 1 1 21 21 0 0 21 0 - 50.0%
This is clean and nice. Because it is in tabular format, it is also easy to use the response in tools such as grep
, awk
, or sed
– a standard set of tools for every administrator. It is also more readable once you know what it is all about.
To add a header describing each column purpose, we just need to add an additional v
parameter, just like this:
curl -XGET 'localhost:9200/_cat/health?v'
Every Cat API endpoint has its own arguments, but there are a few common options that are shared among all of them:
v
: This adds a header line to the response with the names of presented items.h
: This allows us to show only the chosen columns, for example h=status,node.total,shards,pri
.help
: This lists all the possible columns that this particular endpoint is able to show. The command shows the name of the parameter, its abbreviation, and description.bytes
: This is the format for the information representing the values in bytes. As we said earlier, the Cat API is designed to be used by humans and because of this, by default, these values are represented in human-readable form, for example: 3.5kB
or 40GB
. The bytes
option allows the setting of the same base for all the numbers, so sorting or numerical comparison will be easier. For example, bytes=b
presents all values in bytes, bytes=k
in kilobytes, and so on.For the full list of arguments for each Cat API endpoint, please refer to the official Elasticsearch documentation available at: https://www.elastic.co/guide/en/elasticsearch/reference/2.2/cat.html.
When we wrote this book, the Cat API had twenty-two endpoints. We don't want to describe them all –it would be a repeat of information contained in the documentation and it doesn't make sense. However, we didn't want to leave this section without an example regarding the usage of the Cat API. Because of this, we decided to show how easily you can get information using the Cat API compared to the standard JSON API exposed by Elasticsearch.
The first example shows how easy it is to get information about which node in our cluster is the master node. By calling the /_cat/master
REST endpoint we can get information about the nodes and which one of them is currently being elected as a master. For example, let's run the following command:
curl -XGET 'localhost:9200/_cat/master?v'
The response returned by Elasticsearch for my local two-node cluster looks as follows:
id host ip node Cfj3tzqpSNi5SZx4g8osAg 127.0.0.1 127.0.0.1 Skin
As you can see in response, we've got the information about which node is currently elected as the master: we can see its identifier, IP address, and name.
The /_cat/nodes
REST endpoint provides information about all the nodes in the cluster. Let's see what Elasticsearch will return after running the following command:
curl -XGET 'localhost:9200/_cat/nodes?v&h=name,node.role,load,uptime'
In the preceding example, we have used the possibility of choosing what information we want to get from the approximately seventy options of this endpoint. We have chosen to get only the node name, its role— whether the node is a data or client node -, node load, and its uptime.
And the response returned by Elasticsearch looks as follows:
name node.role load uptime Skin d 2.00 1.3h
As you can see, the /_cat/nodes
REST endpoint provides all the requested information about the nodes in the cluster.
Another nice example of using the Cat API is getting information about the recovery of a single index or all the indices. In our case, we will retrieve recovery information for a single library index by running the following command:
curl -XGET 'localhost:9200/_cat/recovery/library?v&h=index,shard,time,type,stage,files_percent'
The response for the preceding command looks as follows:
index shard time type stage files_percent library 0 75 store done 100.0% library 1 83 store done 100.0% library 2 88 store done 100.0% library 3 79 store done 100.0% library 4 5 store done 100.0%