Node discovery

When you start your ElasticSearch nodes, one of the first things ElasticSearch does is it looks for a master node that has the same cluster name and is visible to them. If a master is found, the node gets joined into an already formed cluster. If no master is found then the node itself is selected as a master. The process of forming a cluster and finding nodes is called discovery . The module responsible for discovery has two main purposes—electing a master and discovering new nodes within a cluster. In this section we will discuss how we can configure and tune the discovery module.

Discovery types

By default, without installing additional plugins, ElasticSearch allows us to use zen discovery, which provides us with multicast and unicast discovery. In computer networking terminology, multicast is the delivery of messages to a group of computers in a single transmission. On the other hand we have unicast, which is the transmission of a single message over the network to all possible hosts.

When choosing multicast or unicast, you should be aware if your network can handle multicast messages. If it can, use multicast. If not, use a unicast type of discovery.

Note

If you are using the Linux operating system and want to check if your network supports multicast, please use the ifconfig command for your network interface (usually it will be eth0). If your network supports multicast, you'll see the MULTICAST property in response to the above command.

Master node

As we already mentioned, one of the main purposes of discovery is to choose a master node that will be used as a node that will oversee the cluster. The master node is the one that checks all the other nodes to see if they are responsive (the other nodes ping the master too). The master node will also accept new nodes that want to join the cluster. If the master is somehow disconnected from the cluster, the remaining nodes will select a new master from themselves. All these processes are done automatically on the basis of the configuration values we provide.

Configuring master and data nodes

By default, ElasticSearch allows every node to be a master node and a data node. However, in certain situations you may want to have worker nodes, which will only hold the data and the master nodes that will only be used to process requests and manage the cluster.

In order to set the node to only hold data, we need to tell ElasticSearch that we don't want such a node to be a master node. In order to do that, we add the following properties to the elasticsearch.yml configuration file:

node.master: false
node.data: true

In order to set the node not to hold data but only to be a master node, we need to tell ElasticSearch that we don't want such a node to hold data. In order to do that we add the following properties to the elasticsearch.yml configuration file:

node.master: true
node.data: false

Please note that the node.master and node.data properties are set to true by default, but we tend to include them for the sake of configuration clarity.

Master election configuration

Imagine that you have a cluster that is built of 10 nodes. Everything is working fine until one day when your network fails and three of your nodes are disconnected from the cluster, but they still see each other. Because of the zen discovery and master election process, the nodes that got disconnected elect a new master and you end up with two clusters with the same name, with two master nodes. Such a situation is called a split-brain and you want to avoid it as much as possible, because you may end up with two clusters that won't join each other after the network (or any other) problems are fixed.

In order to prevent split-brain situations, ElasticSearch provides a discovery.zen.minimum_master_nodes property. This property defines a minimum number of master-eligible nodes that should be connected to each other in order to form a cluster. So now let's get back to our cluster; if we were to set the discovery.zen.minimum_master_nodes property to 50 percent of the total nodes available plus 1 (which is 6 in our case), we would end up with a single cluster. Why is that? Before the network failure we would have 10 nodes, which is more than six nodes and those nodes would form a cluster. After the disconnection of the three nodes, we would still have the first cluster up and running, but as three is less than six, those three nodes wouldn't be allowed to elect a new master and they would wait for reconnection with the original cluster.

Setting the cluster name

If we don't set the cluster.name property in our elasticsearch.yml file, ElasticSearch will, by default, use elasticsearch as its value. This is not always a good thing and because of that we suggest setting the cluster.name property to some other value of your choice. Setting a different cluster.name property is also needed if you want to run multiple clusters inside a single network. Otherwise you would end up with nodes belonging to different clusters joining together.

Configuring multicast

Multicast is the default zen discovery method. Apart for the common settings, which we will discuss in a second, there are four properties we can control. They are as follows:

  • discovery.zen.ping.multicast.group: This is the group address to use for the multicast requests. It defaults to 224.2.2.4.
  • discovery.zen.ping.multicast.port: This is the port used for multicast communication. It defaults to 54328.
  • discovery.zen.ping.multicast.ttl: This specifies the number of hops for which the multicast request will be considered valid. It defaults to 3 hops.
  • discovery.zen.ping.multicast.address: This is the address ElasticSearch should bind to. It defaults to the null value, which means that ElasticSearch will try to bind to all network interfaces available.

In order to disable multicast, one should add the multicast.enabled property to the elasticsearch.yml file and set its value to false.

Configuring unicast

Going by how unicast works, we need to specify at least a single host that the unicast message should be sent to. In order to do that we should add the discovery.zen.ping.unicast.hosts property in our elasticsearch.yml configuration file. Basically, we should specify all the hosts that form the cluster in the discovery.zen.ping.unicast.hosts property. For example, if we want the hosts 192.168.2.1, 192.168.2.2 and 192.168.2.3, for our host we should specify the preceding property in the following way:

discovery.zen.ping.unicast.hosts: 192.168.2.1:9300, 192.168.2.2:9300, 192.168.2.3:9300

Please note that the hosts are separated with the comma character, and we've specified the port on which we expect unicast messages.

Nodes ping settings

In addition to the settings discussed previously, we can control or alter the default ping configuration. Ping is a signal sent between nodes to check if they are running and responsive. The master node pings all the other nodes in the cluster and each one of the other nodes in the cluster pings the master node. The following properties can be set:

  • discovery.zen.fd.ping_interval: This defaults to 1s (one second) and specifies how often nodes ping each other
  • discovery.zen.fd.ping_timeout: This defaults to 30s (30 seconds) and defines how long a node will wait for the response to its ping message before considering a node as unresponsive
  • discovery.zen.fd.ping_retries: This defaults to 3 and specifies how many retries should be taken before considering a node as not working

If you experience some problems with your network or you know that your nodes need more time to see the ping response, you can adjust the previously mentioned values to the ones that are good for your deployment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset