High availability

Neo4j in recent years has adapted to handling larger data in a more reliable manner with the introduction of the high availability mode. Its architecture and operations revolve around a core concept: the algorithm for master election. In this section, we will take a look at why we need the master and the algorithm based on which the master is elected.

HA and the need for a master

High availability in Neo4j replicates the graph on all the machines in the cluster and manages write operations between them. High availability does not decentralize the stored graph (which is called sharding); it replicates the complete graph. One machine in the replica set has the authority to receive and propagate updates as well as keep track of locks and context in transactions. This machine is referred to as the master in the HA cluster and the supreme entity that handles, monitors, and takes care of the resources in replica machines.

When you set up a Neo4j cluster, you do not need to designate a master or allocate specialized resources to particular machines. This would create a single point of failures and defeat the purpose of HA when the master fails. Rather, the cluster elects its own master node when needed in a dynamic manner.

The nodes in the cluster need to communicate and they make use of a method called atomic broadcasting, which is used to send messages to all instances in a reliable manner. It is reliable since there is no loss of messages, the ordering of the messages is preserved, and no messages are corrupt or duplicated. In a narrower perspective, the operations are handled by a service called neo4j-coordinator, which basically has the following objectives to take care of:

  • A method to indicate that each machine in the cluster participates in HA, something like a heartbeat. Failure to send this indication indicates that the instance is unable to handle operations for the clusters.
  • In Neo4j, the preceding method also helps to identify how many and which machines currently exist in the cluster.
  • A notification system for use in broadcasting of alerts to the remaining cluster.

The master election

The master keeps the knowledge of the real graph, or the graph's most updated database version. The latest version is determined by the latest transaction ID that was executed. The ID for transactions can be a monotonically increasing entity, as they are serialized in nature, which causes the last committed transaction ID to reflect the database version. This information, however, is internally generated in the database, but we require some external priority-enforcing information to be used for elections in cases where two machines have the same transaction IDs for the latest one. The external information can vary from machine to machine, ranging from the machine's IP to its CPU ID presented in the form of a configurable parameter in Neo4j called ha.server_id. The Lower the value of the server ID of an instance, the higher will be its priority for being elected as master.

So, depending upon the irregular "heartbeat" received from the current master in the cluster, an instance can initiate an election and collect the last transaction ID from each cluster machine. The one having the largest value is the elected master, with the server ID acting as the tiebreaker. On election, the result is notified to the cluster and all the machines along with the new master execute the same algorithm. When the conclusion from all machines coincide, the notification is stopped.

Finally, let's see how atomic broadcast is implemented. Apache Zookeeper is used to implement the Atomic Broadcast protocol (around Version 1.8) that guarantees delivery of messages in an order. Zookeeper is capable of setting a watch on a file in its distributed hierarchical filesystem with provisions for notifications in the event of addition or deletion of nodes. However, later versions might be equipped with Neo4j's own atomic broadcast. In Neo4j, the first machine creates a node defined as the root of the cluster. An ephemeral node (a node valid for the client's lifetime) is added as a child of the root with the server ID as the name and latest transaction ID of its database as its content. For administrative purposes, a node called master-notify is created along with its watch. When more machines are added, they find the root and perform the same basic operations (apart from the administrative ones). Any node can read and write the content of any other node (the node in the cluster and not the one in the graph!).

So, the ephemeral node exists during the lifetime of an instance and is removed from the children of the root when the instance fails and a deletion event is sent to all other instances. If the failed instance was a master, then it will trigger an election and the result will be broadcast by a write to the master-notify node. A master election can also be triggered when a new instance is added to the cluster. Hence, the transaction ID must be up to date to avoid the current master from losing the election. Hence, the coordinator service needs to be configured and maintained. After this head start, you can now explore more of the HA source.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset