Index aliasing and using it to simplify your everyday work

When working with multiple indices in Elasticsearch, you can sometimes lose track of them. Imagine a situation where you store logs in your indices or time-based data in general. Usually, the amount of data in such cases is quite large and, therefore, it is a good solution to have the data divided somehow. A logical division of such data is obtained by creating a single index for a single day of logs (if you are interested in an open source solution used to manage logs, look at the Logstash from the Elasticsearch suite at https://www.elastic.co/products/logstash).

However, after some time, if we keep all the indices, we will start having a problem in taking care of all that. An application needs to take care of all the information, such as which index to send data to, which to query, and so on. With the help of aliases, we can change this to work with a single name just as we would use a single index, but we will work with multiple indices.

An alias

What is an index alias? It's an additional name for one or more indices that allows us to use these indices by referring to them with those additional names. A single alias can have multiple indices as well as the other way round; a single index can be a part of multiple aliases.

However, please remember that you can't use an alias that has multiple indices for indexing or for real-time GET operations. Elasticsearch will throw an exception if you do this. We can still use an alias that links to only a single index for indexing, though. This is because Elasticsearch doesn't know in which index the data should be indexed or from which index the document should be fetched.

Creating an alias

To create an index alias, we need to run the HTTP POST method to the _aliases REST end-point with a defined action. For example, the following request will create a new alias called week12 that will include the indices named day10, day11, and day12 (we need to create those indices first):

curl -XPOST 'localhost:9200/_aliases' -d '{
  "actions" : [
    { "add" : { "index" : "day10", "alias" : "week12" } },
    { "add" : { "index" : "day11", "alias" : "week12" } },
    { "add" : { "index" : "day12", "alias" : "week12" } }

  ]
}'

If the week12 alias isn't present in our Elasticsearch cluster, the preceding command will create it. If it is present, the command will just add the specified indices to it.

We would run a search across the three indices as follows:

curl -XGET 'localhost:9200/day10,day11,day12/_search?q=test'

If everything goes well, we can instead run it as follows:

curl -XGET 'localhost:9200/week12/_search?q=test'

Isn't this better?

Sometimes we have a set of indices where every index serves independent information but some queries should go across all of them; for example, we have dedicated indices for countries (country_en, country_us, country_de, and so on). In this case, we would create the alias by grouping them all:

curl -XPOST 'localhost:9200/_aliases' -d '{
  "actions" : [
    { "add" : { "index" : "country_*", "alias" : "countries" } }
  ]
}'

The last command created only one alias. Elasticsearch allows you to rewrite this to something less verbose:

curl -XPUT 'localhost:9200/country_*/_alias/countries'

Modifying aliases

Of course, you can also remove indices from an alias. We can do this similarly to how we add indices to an alias, but instead of the add command, we use the remove one. For example, to remove the index named day9 from the week12 index, we will run the following command:

curl -XPOST 'localhost:9200/_aliases' -d '{
 "actions" : [
    { "remove" : { "index" : "day9", "alias" : "week12" } }
  ]
}'

Combining commands

The add and remove commands can be sent as a single request. For example, if you would like to combine all the previously sent commands into a single request, you will have to send the following command:

curl -XPOST 'localhost:9200/_aliases' -d '{
  "actions" : [
    { "add" : { "index" : "day10", "alias" : "week12" } },
    { "add" : { "index" : "day11", "alias" : "week12" } },
    { "add" : { "index" : "day12", "alias" : "week12" } },
    { "remove" : { "index" : "day9", "alias" : "week12" } }
   ]
}'

Retrieving aliases

In addition to adding or removing indices to or from aliases, we and our applications that use Elasticsearch may need to retrieve all the aliases available in the cluster or all the aliases that an index is connected to. To retrieve these aliases, we send a request using the HTTP GET command. For example, the following command gets all the aliases for the day10 index and the second one will get all the available aliases:

curl -XGET 'localhost:9200/day10/_aliases'
curl -XGET 'localhost:9200/_aliases'

The response from the second command is as follows:

{
  "day12" : {
    "aliases" : {
      "week12" : { }
    }
  },
  "library" : {
    "aliases" : { }
  },
  "day11" : {
    "aliases" : {
      "week12" : { }
    }
  },
  "day9" : {
    "aliases" : { }
  },
  "day10" : {
    "aliases" : {
      "week12" : { }
    }
  }
}

You can also use the _alias endpoint to get all aliases from the given index:

curl -XGET 'localhost:9200/day10/_alias/*'

To get a particular alias definition, you can use the following:

curl -XGET 'localhost:9200/day10/_alias/day12'

Removing aliases

You can also remove an alias using the _alias endpoint. For example, sending the following command will remove the client alias from the data index:

curl -XDELETE localhost:9200/data/_alias/client

Filtering aliases

Aliases can be used in a way similar to how views are used in SQL databases. You can use a full Query DSL (discussed in detail in Chapter 3, Searching Your Data) and have your filter applied to all count, search, delete by query, and so on.

Let's look at an example. Imagine that we want to have aliases that return data for a certain client so we can use it in our application. Let's say that the client identifier we are interested in is stored in the clientId field and we are interested in the 12345 client. So, let's create the alias named client with our data index, which will apply a query for clientId automatically:

curl -XPOST 'localhost:9200/_aliases' -d '{
  "actions" : [
    {
      "add" : {
        "index" : "data",
        "alias" : "client",
        "filter" : { "term" : { "clientId" : 12345 } }
      }
    }
  ]
}'

So when using the defined alias, you will always get your request filtered by a term query that ensures that all the documents have the 12345 value in the clientId field.

Aliases and routing

In the Introduction to routing section of Chapter 2, Indexing Your Data, we talked about routing. Similar to aliases that use filtering, we can add routing values to the aliases. Imagine that we are using routing on the basis of user identifier and we want to use the same routing values with our aliases. So, for the alias named client, we will use the routing values of 12345, 12346, and 12347 for querying, and only 12345 for indexing. To do this, we will create an alias using the following command:

curl -XPOST 'localhost:9200/_aliases' -d '{
  "actions" : [
    {
      "add" : {
        "index" : "data",
        "alias" : "client",
        "search_routing" : "12345,12346,12347",
        "index_routing" : "12345"
      }
    }
  ]
}'

This way, when we index our data using the client alias, the values specified by the index_routing property will be used. At the time of querying, the values specified by the search_routing property will be used.

There is one more thing. Please look at the following query sent to the previously defined alias:

curl -XGET 'localhost:9200/client/_search?q=test&routing=99999,12345'

The value used as a routing value will be 12345. This is because Elasticsearch will take the common values of the search_routing attribute and the query routing parameter, which in our case is 12345.

Zero downtime reindexing and aliases

One of the greatest advantages of using aliases is the ability to re-index the data without any downtime from the system using Elasticsearch. To achieve this, you would need to interact with your indices only through aliases—both for indexing and querying. In such a case, you can just create a new index, index the data here, and switch aliases when needed. During indexing, aliases would still point to the old index, so the application could work as usual.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset