When working with multiple indices in Elasticsearch, you can sometimes lose track of them. Imagine a situation where you store logs in your indices or time-based data in general. Usually, the amount of data in such cases is quite large and, therefore, it is a good solution to have the data divided somehow. A logical division of such data is obtained by creating a single index for a single day of logs (if you are interested in an open source solution used to manage logs, look at the Logstash from the Elasticsearch suite at https://www.elastic.co/products/logstash).
However, after some time, if we keep all the indices, we will start having a problem in taking care of all that. An application needs to take care of all the information, such as which index to send data to, which to query, and so on. With the help of aliases, we can change this to work with a single name just as we would use a single index, but we will work with multiple indices.
What is an index alias? It's an additional name for one or more indices that allows us to use these indices by referring to them with those additional names. A single alias can have multiple indices as well as the other way round; a single index can be a part of multiple aliases.
However, please remember that you can't use an alias that has multiple indices for indexing or for real-time GET
operations. Elasticsearch will throw an exception if you do this. We can still use an alias that links to only a single index for indexing, though. This is because Elasticsearch doesn't know in which index the data should be indexed or from which index the document should be fetched.
To create an index alias, we need to run the HTTP POST
method to the _aliases
REST end-point with a defined action. For example, the following request will create a new alias called week12
that will include the indices named day10
, day11
, and day12
(we need to create those indices first):
curl -XPOST 'localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "day10", "alias" : "week12" } }, { "add" : { "index" : "day11", "alias" : "week12" } }, { "add" : { "index" : "day12", "alias" : "week12" } } ] }'
If the week12
alias isn't present in our Elasticsearch cluster, the preceding command will create it. If it is present, the command will just add the specified indices to it.
We would run a search across the three indices as follows:
curl -XGET 'localhost:9200/day10,day11,day12/_search?q=test'
If everything goes well, we can instead run it as follows:
curl -XGET 'localhost:9200/week12/_search?q=test'
Isn't this better?
Sometimes we have a set of indices where every index serves independent information but some queries should go across all of them; for example, we have dedicated indices for countries (country_en
, country_us
, country_de
, and so on). In this case, we would create the alias by grouping them all:
curl -XPOST 'localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "country_*", "alias" : "countries" } } ] }'
The last command created only one alias. Elasticsearch allows you to rewrite this to something less verbose:
curl -XPUT 'localhost:9200/country_*/_alias/countries'
Of course, you can also remove indices from an alias. We can do this similarly to how we add indices to an alias, but instead of the add
command, we use the remove
one. For example, to remove the index named day9
from the week12
index, we will run the following command:
curl -XPOST 'localhost:9200/_aliases' -d '{ "actions" : [ { "remove" : { "index" : "day9", "alias" : "week12" } } ] }'
The add
and remove
commands can be sent as a single request. For example, if you would like to combine all the previously sent commands into a single request, you will have to send the following command:
curl -XPOST 'localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "day10", "alias" : "week12" } }, { "add" : { "index" : "day11", "alias" : "week12" } }, { "add" : { "index" : "day12", "alias" : "week12" } }, { "remove" : { "index" : "day9", "alias" : "week12" } } ] }'
In addition to adding or removing indices to or from aliases, we and our applications that use Elasticsearch may need to retrieve all the aliases available in the cluster or all the aliases that an index is connected to. To retrieve these aliases, we send a request using the HTTP GET
command. For example, the following command gets all the aliases for the day10
index and the second one will get all the available aliases:
curl -XGET 'localhost:9200/day10/_aliases' curl -XGET 'localhost:9200/_aliases'
The response from the second command is as follows:
{ "day12" : { "aliases" : { "week12" : { } } }, "library" : { "aliases" : { } }, "day11" : { "aliases" : { "week12" : { } } }, "day9" : { "aliases" : { } }, "day10" : { "aliases" : { "week12" : { } } } }
You can also use the _alias
endpoint to get all aliases from the given index:
curl -XGET 'localhost:9200/day10/_alias/*'
To get a particular alias definition, you can use the following:
curl -XGET 'localhost:9200/day10/_alias/day12'
You can also remove an alias using the _alias
endpoint. For example, sending the following command will remove the client alias from the data index:
curl -XDELETE localhost:9200/data/_alias/client
Aliases can be used in a way similar to how views are used in SQL databases. You can use a full Query DSL (discussed in detail in Chapter 3, Searching Your Data) and have your filter applied to all count, search, delete by query, and so on.
Let's look at an example. Imagine that we want to have aliases that return data for a certain client so we can use it in our application. Let's say that the client identifier we are interested in is stored in the clientId
field and we are interested in the 12345
client. So, let's create the alias named client
with our data index, which will apply a query for clientId
automatically:
curl -XPOST 'localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "data", "alias" : "client", "filter" : { "term" : { "clientId" : 12345 } } } } ] }'
So when using the defined alias, you will always get your request filtered by a term query that ensures that all the documents have the 12345
value in the clientId
field.
In the Introduction to routing section of Chapter 2, Indexing Your Data, we talked about routing. Similar to aliases that use filtering, we can add routing values to the aliases. Imagine that we are using routing on the basis of user identifier and we want to use the same routing values with our aliases. So, for the alias named client
, we will use the routing values of 12345
, 12346
, and 12347
for querying, and only 12345
for indexing. To do this, we will create an alias using the following command:
curl -XPOST 'localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "data", "alias" : "client", "search_routing" : "12345,12346,12347", "index_routing" : "12345" } } ] }'
This way, when we index our data using the client
alias, the values specified by the index_routing
property will be used. At the time of querying, the values specified by the search_routing
property will be used.
There is one more thing. Please look at the following query sent to the previously defined alias:
curl -XGET 'localhost:9200/client/_search?q=test&routing=99999,12345'
The value used as a routing value will be 12345
. This is because Elasticsearch will take the common values of the search_routing
attribute and the query routing parameter, which in our case is 12345
.
One of the greatest advantages of using aliases is the ability to re-index the data without any downtime from the system using Elasticsearch. To achieve this, you would need to interact with your indices only through aliases—both for indexing and querying. In such a case, you can just create a new index, index the data here, and switch aliases when needed. During indexing, aliases would still point to the old index, so the application could work as usual.