Chapter 10. Administrating Your Cluster

In the previous chapter, we focused on Elasticsearch nodes and cluster configuration. We started by discussing the node discovery process, what it is and how to configure it. We've discussed gateway and recovery modules and tuned them to match our needs. We've used templates and dynamic templates to manage data structure easily and learned how to install plugins to extend the functionalities of Elasticsearch. Finally, we've learned about the caches of Elasticsearch and how to update indices and cluster settings using a dedicated API. By the end of this chapter, you will have learned the following topics:

  • Backing up your indices in Elasticsearch
  • Monitoring your clusters
  • Controlling shards and rebalancing replicas
  • Controlling shards and allocating replicas
  • Using CAT API to learn about cluster state
  • Warming up
  • Aliasing

Elasticsearch time machine

A good piece of software is a one that can manage exceptional situations such as hardware failure or human error. Even though a cluster of a few servers is less dependent on hardware problems, bad things can still happen. For example, let's imagine that you need to restore your indices. One possible solution is to reindex all your data from a primary data store such as a SQL database. But what will you do if it takes too long or, even worse, the only data store is Elasticsearch? Before Elasticsearch 1.0, creating backups of indices was not easy. The procedure included stopping indexation, flushing the data to disk, shutting down the cluster, and, finally, copying the data to a backup device.

Fortunately, now we can take snapshots and this section will guide you and show how this functionality works.

Creating a snapshot repository

A snapshot keeps all the data related to the cluster from the time the snapshot creation starts and it includes information about the cluster state and indices. Before we create snapshots, at least the first one, a snapshot repository must be created. Each repository is recognized by its name and should define the following aspects:

  • name: A unique name of the repository; we will need it later.
  • type: The type of the repository. The possible values are fs (a repository on a shared file system) and url (a read-only repository available via URL)
  • settings: Additional information needed depending on the repository type

Now, let's create a file system repository. Before this, we have to make sure that the directory for our backups fulfils two requirements. The first is related to security. Every repository has to be placed in the path defined in the Elasticsearch configuration file as path.repo. For example, our elasticsearch.yml includes a line similar to the following one:

path.repo: ["/tmp/es_backup_folder", "/tmp/backup/es"]

The second requirement says that every node in the cluster should be able to access the directory we set for the repository.

So now, let's create a new file system repository by running the following command:

curl -XPUT localhost:9200/_snapshot/backup -d '{
  "type": "fs",
  "settings": {
    "location": "/tmp/es_backup_folder/cluster1"
  }
}'

The preceding command creates a repository named backup, which stores the backup files in the directory given by the location attribute. Elasticsearch responds with the following information:

{"acknowledged":true}

At the same time, es_backup_folder on the local file system is created—without any content yet.

Note

You can also set a relative path with the location parameter. In this case, Elasticsearch determines the absolute path by first getting the directory defined in path.repo.

As we said, the second repository type is url. It requires a url parameter instead of the location, which points to the address where the repository resides, for example, the HTTP address. As in the previous case, the address should be defined in the repositories.url.allowed_urls parameter in the Elasticsearch configuration. The parameter allows the use of wildcards in the address.

Note

Note that file:// addresses are checked against the paths defined in the path.repo parameter.

You can also store snapshots in Amazon S3, HDFS, or Azure using the additional plugins available. To learn about these, please visit the following pages:

Now that we have our first repository, we can see its definition using the following command:

curl -XGET localhost:9200/_snapshot/backup?pretty

We can also check all the repositories by running a command like the following:

curl -XGET localhost:9200/_snapshot/_all?pretty

Or simply, we can use this:

curl -XGET localhost:9200/_snapshot/_all?pretty
curl -XGET localhost:9200/_snapshot/?pretty

If you want to delete a snapshot repository, the standard DELETE command helps:

curl -XDELETE localhost:9200/_snapshot/backup?pretty

Creating snapshots

By default, Elasticsearch takes all the indices and cluster settings (except the transient ones) when creating snapshots. You can create any number of snapshots and each will hold information available right from the time when the snapshot was created. The snapshots are created in a smart way; only new information is copied. This means that Elasticsearch knows which segments are already stored in the repository and doesn't have to save them again.

To create a new snapshot, we need to choose a unique name and use the following command:

curl -XPUT 'localhost:9200/_snapshot/backup/bckp1'

The preceding command defines a new snapshot named bckp1 (you can only have one snapshot with a given name; Elasticsearch will check its uniqueness) and data is stored in the previously defined backup repository. The command returns an immediate response, which looks as follows:

{"accepted":true}

The preceding response means that the process of snapshot-ing has started and continues in the background. If you would like the response to be returned only when the actual snapshot is created, you can add the wait_for_completion=true parameter as shown in the following example:

curl -XPUT 'localhost:9200/_snapshot/backup/bckp2?wait_for_completion=true&pretty'

The response to the preceding command shows the status of a created snapshot:

{
  "snapshot" : {
    "snapshot" : "bckp2",
    "version_id" : 2000099,
    "version" : "2.2.0",
    "indices" : [ "news" ],
    "state" : "SUCCESS",
    "start_time" : "2016-01-07T21:21:43.740Z",
    "start_time_in_millis" : 1446931303740,
    "end_time" : "2016-01-07T21:21:44.750Z",
    "end_time_in_millis" : 1446931304750,
    "duration_in_millis" : 1010,
    "failures" : [ ],
    "shards" : {
      "total" : 5,
      "failed" : 0,
      "successful" : 5
    }
  }
}

As you can see, Elasticsearch presents information about the time taken by the snapshot-ing process, its status, and the indices affected.

Additional parameters

The snapshot command also accepts the following additional parameters:

  • indices: The names of the indices of which we want to take snapshots.
  • ignore_unavailable: When this is set to false (the default), Elasticsearch will return an error if any index listed using the indices parameter is missing. When set to true, Elasticsearch will just ignore the missing indices during backup.
  • include_global_state: When this is set to true (the default), the cluster state is also written to the snapshot (except for the transient settings).
  • partial: The snapshot operation success depends on the availability of all the shards. If any of the shards is not available, the snapshot operation will fail. Setting partial to true causes Elasticsearch to save only the available shards and omit the lost ones.

An example of using additional parameters can look as follows:

curl -XPUT 'localhost:9200/_snapshot/backup/bckp3?wait_for_completion=true&pretty' -d '{
  "indices": "b*",
  "include_global_state": "false"
}'

Restoring a snapshot

Now that we have our snapshots done, we will also learn how to restore data from a given snapshot. As we said earlier, a snapshot can be addressed by its name. We can list all the snapshots using the following command:

curl -XGET 'localhost:9200/_snapshot/backup/_all?pretty'

The response returned by Elasticsearch to the preceding command shows the list of all available backups. Every list item is similar to the following:

{
  "snapshot" : {
    "snapshot" : "bckp2",
    "version_id" : 2000099,
    "version" : "2.2.0",
    "indices" : [ "news" ],
    "state" : "SUCCESS",
    "start_time" : "2016-01-07T21:21:43.740Z",
    "start_time_in_millis" : 1446931303740,
    "end_time" : "2016-01-07T21:21:44.750Z",
    "end_time_in_millis" : 1446931304750,
    "duration_in_millis" : 1010,
    "failures" : [ ],
    "shards" : {
      "total" : 5,
      "failed" : 0,
      "successful" : 5
    }
  }
}

The repository we created earlier is called backup. To restore a snapshot named bckp1 from our snapshot repository, run the following command:

curl -XPOST 'localhost:9200/_snapshot/backup/bckp1/_restore'

During the execution of this command, Elasticsearch takes the indices defined in the snapshot and creates them with the data from the snapshot. However, if the index already exists and is not closed, the command will fail. In this case, you may find it convenient to only restore certain indices, for example:

curl -XPOST 'localhost:9200/_snapshot/backup/bckp1/_restore?pretty' -d '{
"indices": "c*"}'

The preceding command restores only the indices that begin with the letter c. The other available parameters are as follows:

  • ignore_unavailable: This parameter when set to false (the default behavior), will cause Elasticsearch to fail the restore process if any of the expected indices is not available.
  • include_global_state: This parameter when set to true will cause Elasticsearch to restore the global state included in the snapshot, which is also the default behavior.
  • rename_pattern: This parameter allows the renaming of the index during a restore operation. Thanks to this, the restored index will have a different name. The value of this parameter is a regular expression that defines the source index name. If a pattern matches the name of the index, name substitution will occur. In the pattern, you should use groups limited by parentheses used in the rename_replacement parameter.
  • rename_replacement: This parameter along with rename_pattern defines the target index name. Using the dollar sign and number, you can recall the appropriate group from rename_pattern.

For example, due to rename_pattern=products_(.*), only the indices with names that begin with products_ will be restored. The rest of the index name will be used during replacement. rename_pattern=products_(.*) together with rename_replacement=items_$1 causes the products_cars index to be restored to an index called items_cars.

Cleaning up – deleting old snapshots

Elasticsearch leaves snapshot repository management up to you. Currently, there is no automatic clean-up process. But don't worry; this is simple. For example, let's remove our previously taken snapshot:

curl -XDELETE 'localhost:9200/_snapshot/backup/bckp1?pretty'

And that's all. The command causes the snapshot named bckp1 from the backup repository to be deleted.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset