The Cat API

The Elasticsearch Admin API is quite extensive and covers almost every part of Elasticsearch architecture: from low-level information about Lucene to high-level ones about the cluster nodes and their health. All this information is available using the Elasticsearch Java API as well as the REST API. However, the returned data, even though it is a JSON document, is not very readable by a user, at least when it comes to the amount of information given.

Because of this, Elasticsearch provides us with a more human-friendly API – the Cat API. The special Cat API returns data in a simple text, tabular format and what's more – it provides aggregated data that is usually usable without any further processing.

The basics

The base endpoint for the Cat API is quite obvious: it is /_cat. Without any parameters, it shows all the available endpoints for this API. We can check this by running the following command:

curl -XGET 'localhost:9200/_cat'

The response returned by Elasticsearch should be similar or identical (depending on your Elasticsearch version) to the following one:

=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}

So looking from the top Elasticsearch allows us to get the following information using the Cat API:

  • Shard allocation-related information
  • All shards-related information (also one limited to a given index)
  • Information about the master node
  • Nodes information
  • Indices statistics (also one limited to a given index)
  • Segments statistics (also one limited to a given index)
  • Documents count (also one limited to a given index)
  • Recovery information (also one limited to a given index)
  • Cluster health
  • Tasks pending for execution
  • Index aliases and indices for a given alias
  • Thread pool configuration
  • Plugins installed on each node
  • Field data cache size and field data cache sizes for individual fields
  • Node attributes information
  • Defined backup repositories
  • Snapshots created in the backup repository

Using Cat API

Using the Cat API is as simple as running the GET request to the one of the previously mentioned REST end-points. For example, to get information about the cluster state, we could run the following command:

curl -XGET 'localhost:9200/_cat/health'

The response returned by Elasticsearch for the preceding command should be similar to the following one, but, of course, will be dependent on your cluster:

1446292041 12:47:21 elasticsearch yellow 1 1 21 21 0 0 21 0 - 50.0%

This is clean and nice. Because it is in tabular format, it is also easy to use the response in tools such as grep, awk, or sed – a standard set of tools for every administrator. It is also more readable once you know what it is all about.

To add a header describing each column purpose, we just need to add an additional v parameter, just like this:

curl -XGET 'localhost:9200/_cat/health?v'

Common arguments

Every Cat API endpoint has its own arguments, but there are a few common options that are shared among all of them:

  • v: This adds a header line to the response with the names of presented items.
  • h: This allows us to show only the chosen columns, for example h=status,node.total,shards,pri.
  • help: This lists all the possible columns that this particular endpoint is able to show. The command shows the name of the parameter, its abbreviation, and description.
  • bytes: This is the format for the information representing the values in bytes. As we said earlier, the Cat API is designed to be used by humans and because of this, by default, these values are represented in human-readable form, for example: 3.5kB or 40GB. The bytes option allows the setting of the same base for all the numbers, so sorting or numerical comparison will be easier. For example, bytes=b presents all values in bytes, bytes=k in kilobytes, and so on.

    Note

    For the full list of arguments for each Cat API endpoint, please refer to the official Elasticsearch documentation available at: https://www.elastic.co/guide/en/elasticsearch/reference/2.2/cat.html.

The examples

When we wrote this book, the Cat API had twenty-two endpoints. We don't want to describe them all –it would be a repeat of information contained in the documentation and it doesn't make sense. However, we didn't want to leave this section without an example regarding the usage of the Cat API. Because of this, we decided to show how easily you can get information using the Cat API compared to the standard JSON API exposed by Elasticsearch.

Getting information about the master node

The first example shows how easy it is to get information about which node in our cluster is the master node. By calling the /_cat/master REST endpoint we can get information about the nodes and which one of them is currently being elected as a master. For example, let's run the following command:

curl -XGET 'localhost:9200/_cat/master?v'

The response returned by Elasticsearch for my local two-node cluster looks as follows:

id                     host      ip        node
Cfj3tzqpSNi5SZx4g8osAg 127.0.0.1 127.0.0.1 Skin

As you can see in response, we've got the information about which node is currently elected as the master: we can see its identifier, IP address, and name.

Getting information about the nodes

The /_cat/nodes REST endpoint provides information about all the nodes in the cluster. Let's see what Elasticsearch will return after running the following command:

curl -XGET 'localhost:9200/_cat/nodes?v&h=name,node.role,load,uptime'

In the preceding example, we have used the possibility of choosing what information we want to get from the approximately seventy options of this endpoint. We have chosen to get only the node name, its role— whether the node is a data or client node -, node load, and its uptime.

And the response returned by Elasticsearch looks as follows:

name node.role load uptime
Skin d         2.00   1.3h

As you can see, the /_cat/nodes REST endpoint provides all the requested information about the nodes in the cluster.

Retrieving recovery information for an index

Another nice example of using the Cat API is getting information about the recovery of a single index or all the indices. In our case, we will retrieve recovery information for a single library index by running the following command:

curl -XGET 'localhost:9200/_cat/recovery/library?v&h=index,shard,time,type,stage,files_percent' 

The response for the preceding command looks as follows:

index   shard time type  stage files_percent
library 0     75   store done  100.0%
library 1     83   store done  100.0%
library 2     88   store done  100.0%
library 3     79   store done  100.0%
library 4     5    store done  100.0%
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset