The human-friendly status API – using the Cat API

The Elasticsearch Admin API is quite extensive and covers almost every part of its architecture—from low-level information about Lucene to high-level information about the cluster nodes and their health. All this information is available both using the Elasticsearch Java API as well as using the REST API; however, the data is returned in the JSON format. What's more—the returned data can sometimes be hard to analyze without further parsing. For example, try to run the following request on your Elasticsearch cluster:

curl -XGET 'localhost:9200/_stats?pretty'

On our local, single node cluster, Elasticsearch returns the following information (we cut it down drastically; the full response can be found in the stats.json file provided with the book):

{
  "_shards" : {
    "total" : 60,
    "successful" : 30,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      .
      .
      .
    },
    "total" : {
      .
      .
      .
    }
  },
  "indices" : {
  .
  .
  .
  }
}

If you look at the provided stats.json file, you would see that the response is about 1,350 lines long. This isn't quite convenient for analysis by a human without additional parsing. Because of this, Elasticsearch provides us with a more human-friendly API—the Cat API. The special Cat API returns data in a simple text, tabular format, and what's more, it provides aggregated data that is usually usable without any further processing.

Note

Remember that we've told you that Elasticsearch allows you to get information not just in the JSON format? If you don't remember this, please try to add the format=yaml request parameter to your request.

The basics

The base endpoint for the Cat API is quite obvious—it is /_cat. Without any parameters, it shows us all the available endpoints for that API. We can check this by running the following command:

curl -XGET 'localhost:9200/_cat'

The response returned by Elasticsearch should be similar or identical (depending on your Elasticsearch version) to the following one:

=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}

So, looking for the top Elasticsearch allows us to get the following information using the Cat API:

  • Shard allocation-related information
  • All shard-related information (limited to a given index)
  • Nodes information, including elected master indication
  • Indices' statistics (limited to a given index)
  • Segments' statistics (limited to a given index)
  • Documents' count (limited to a given index)
  • Recovery information (limited to a given index)
  • Cluster health
  • Tasks pending execution
  • Index aliases and indices for a given alias
  • The thread pool configuration
  • Plugins installed on each node
  • The field data cache size and field data cache sizes for individual fields

Using the Cat API

Let's start using the Cat API through an example. We can start with checking the cluster health of our Elasticsearch cluster. To do this, we just run the following command:

curl -XGET 'localhost:9200/_cat/health'

The response returned by Elasticsearch to the preceding command should be similar to the following one:

1414347090 19:11:30 elasticsearch yellow 1 1 47 47 0 0 47

It is clean and nice. Because it is in a tabular format, it is also easier to use the response in tools such as grep, awk, or sed—a standard set of tools for every administrator. It is also more readable once you know what it is all about. To add a header describing each column purpose, we just need to add an additional v parameter just like this:

curl -XGET 'localhost:9200/_cat/health?v'

The response is very similar to what we've seen previously, but it now contains a header describing each column:

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign
1414347107 19:11:47  elasticsearch yellow          1         1     47  47    0    0       47

Common arguments

Every Cat API endpoint has its own arguments, but there are a few common options that are shared among all of them:

  • v: This adds a header line to response with names of presented items.
  • h: This allows us to show only chosen columns (refer to the next section).
  • help: This lists all possible columns that this particular endpoint is able to show. The command shows the name of the parameter, its abbreviation, and the description.
  • bytes: This is the format for information representing values in bytes. As we said, the Cat API is designed to be used by humans and, because of that, these values are represented in a human-readable form by default, for example, 3.5kB or 40GB. The bytes option allows us to set the same base for all numbers, so sorting or numerical comparison will be easier. For example, bytes=b presents all values in bytes, bytes=k in kilobytes, and so on.

Note

For the full list of arguments for each Cat API endpoint, refer to the official Elasticsearch documentation available at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat.html.

The examples

When we wrote this book, the Cat API had 21 endpoints. We don't want to describe them all—it would be a repetition of information contained in the documentation or chapters about the administration API. However, we didn't want to leave this section without any example regarding the usage of the Cat API. Because of this, we decided to show you how easily you can get information using the Cat API compared to the standard JSON API exposed by Elasticsearch.

Getting information about the master node

The first example shows you how easy it is to get information about which node in our cluster is the master node. By calling the /_cat/master REST endpoint, we can get information about the nodes and which one of them is currently being elected as a master. For example, let's run the following command:

curl -XGET 'localhost:9200/_cat/master?v'

The response returned by Elasticsearch for my local two nodes cluster looks as follows:

id                     host            ip       node
8gfdQlV-SxKB0uUxkjbxSg Banshee.local 10.0.1.3 Siege

As you can see in the response, we've got the information about which node is currently elected as the master—we can see its identifier, IP address, and name.

Getting information about the nodes

The /_cat/nodes REST endpoint provides information about all the nodes in the cluster. Let's see what Elasticsearch will return after running the following command:

curl -XGET 'localhost:9200/_cat/nodes?v&h=name,node.role,load,uptime'

In the preceding example, we have used the possibility of choosing what information we want to get from the approximately 70 options for this endpoint. We have chosen to get only the node name, its role—whether a node is a data or client node— node load, and its uptime.

The response returned by Elasticsearch looks as follows:

name           node.role load uptime
Alicia Masters d         6.09   6.7m
Siege          d         6.09     1h

As you can see the /_cat/nodes REST endpoint provides all requested information about the nodes in the cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset