Configuring a replica set

We have had a good discussion on what a replica set is and how to start a simple replica set, in the Starting multiple instances as part of a replica set recipe in Chapter 1, Installing and Starting the MongoDB Server. In the Understanding interprocess security in MongoDB recipe, we saw how to start a replica set with interprocess authentication. To be honest, that is pretty much what we do while setting up a standard replica set. However, there are a few configurations that one must know; one must also be aware of how they affect the replica set's behavior. Note that we are still not discussing tag aware replication in this recipe; it will be taken up later in this chapter as a separate recipe Building tagged replica sets.

Getting ready

Refer to the recipe Starting multiple instances as part of a replica set in Chapter 1, Installing and Starting the MongoDB Server, for the prerequisites and to know about the replica set basics. Go ahead and set up a simple three-node replica set on your computer, as mentioned in the recipe.

Before we go ahead with the configurations, we will see what elections are in a replica set and how they work from a high level. It is good to know about elections because some of the configuration options affect the voting process in the elections.

Elections in a replica set

A Mongo replica set has one primary instance and multiple secondary instances. All writes happen only through the primary instance and are replicated to the secondary instances. Read operations can happen from secondary instances, depending on the read preference. Refer to the Read preference for querying section in Appendix, Concepts for Reference, to know what read preference is. However, if the primary goes down or is not reachable for some reason, the replica set becomes unavailable for writes. A Mongo replica set has a feature to automatically failover to a secondary, by promoting it to a primary and making the set available to clients for both read and write operations. The replica set remains unavailable for that brief moment till a new primary comes up.

All this sounds good, but the question is, who decides what the new primary instance will be? The process of choosing a new primary happens through an election. Whenever any secondary detects that it cannot reach out to a primary, it asks all the replica set nodes in the instance to elect themselves as the new primary.

All other nodes in the replica set that receive this request for the election of the primary will perform certain checks before they vote a yes to the secondary requesting an election. Let's take a look at the steps:

  1. They will first check whether the existing primary is reachable. This is necessary because the secondary requesting the re-election is not able to reach the primary, possibly because of a network partition, in which case it should not be allowed to become a primary. In such a case, the instance receiving the request will vote a no.
  2. Secondly, the instance will check the state of replicating itself with the secondary requesting the election. If it finds that the requesting secondary is behind itself in the replicated data, it would vote a no.
  3. Finally, the primary is not reachable, but some instance with higher priority than the secondary requesting the re-election is reachable from it. This again is possible if the secondary requesting the re-election can't reach out to the secondary with higher priority, possibly due to a network partition. In this scenario, the instance receiving the request for election will vote a no.

The preceding checks are pretty much what will be happening (not necessarily in the order mentioned here) during the re-election. If these checks pass, the instance votes a yes.

The election is void if even a single instance votes no. However, if none of the instances have voted no, then the secondary that requests the election will become a new primary if it receives a yes from the majority of instances. If the election becomes void, there will be a re-election with the same secondary or any other instance requesting an election with the preceding mentioned process, till a new primary is elected.

Now that we have an idea about elections in a replica set and the terminologies, let us look at some replica set configurations. A few of these options are related to votes, and we start by looking at these options first.

Basic configuration for a replica set

From Chapter 1, Installing and Starting the MongoDB Server, when we set up a replica set, we have a configuration similar to the following one. The basic replica set configuration for a three-member set is as follows:

{
  "_id" : "replSet",
  "members" : [
    {
      "_id" : 0,
      "host" : "Amol-PC:27000"
    },
    {
      "_id" : 1,
      "host" : "Amol-PC:27001"
    },
    {
      "_id" : 2,
      "host" : "Amol-PC:27002"
    }
  ]
}

We will not be repeating the entire configuration in the steps in the following sections. All the flags we mention will be added to the document of a particular member in the members array. For example, in the preceding example, if a node with _id as 2 is to be made an arbiter, we will have the following configuration for it in the configuration document shown earlier:

{
  "_id" : 2,
  "host" : "Amol-PC:27002"
  "arbiterOnly" : true
}

Generally, the steps to reconfigure a replica set that has already been set up are as follows:

  1. Assign the configuration document to a variable. If the replica set is already configured, it can be obtained using the rs.conf() call from the shell as follows:
    > var conf = rs.conf()
    
  2. The members field in the document is an array of documents for each individual member of a replica set. To add a new property to a particular member, we need to execute the following command. For instance, if we want to add the votes key and set its value to 2 for the third member of the replica set (index 2 in the array), we execute the following command:
    > conf.members[2].votes = 2
    
  3. Just changing the JSON document won't change the replica set. We need to reconfigure it as follows if the replica set is already in place:
    > rs.reconfig(conf)
    
  4. If the configuration is done for the first time, we will call the following command:
    > rs.initiate (conf)
    

For all the steps given in the next section, you need to follow the preceding steps to reconfigure or initiate the replica set, unless some other steps are mentioned explicitly.

How to do it…

In this recipe, we will look at some of the possible configurations that can be used in a replica set. The explanation here will be minimal with all the explanations done as usual in the next section.

  1. The first configuration is an arbiter option that is used to configure a replica set member as a member that holds no data but only has rights to vote. The following key needs to be added to the configuration of the member who will be made an arbiter:
    {_id: ... , 'arbiterOnly': true }
    
  2. One thing to remember regarding this configuration is that once a replica set is initiated, no existing member can be changed to an arbiter from a nonarbiter node and vice versa. However, we can add an arbiter to an existing replica set using the helper function rs.addArb(<hostname>:<port>). For example, to add an arbiter listening to port 27004 to an existing replica set, the following command was executed on my machine:
    > rs.addArb('Amol-PC:27004')
    

    When the server starts to listen to port 27004, and rs.status() is executed from the Mongo shell, we see that state and strState for this member are 7 and ARBITER respectively.

  3. The next option, votes, affects the number of votes a member gets in the election. By default, all members get one vote each. This option can be used to change the number of votes a particular member gets. It can be set as follows:
    {_id: ... , 'votes': <number of votes>}
    

    The votes of existing members of a replica set can be changed and the replica set can be reconfigured using rs.reconfig().

    Though the option votes is available, which can potentially change the number of votes to form a majority, it usually doesn't add much value and is not a recommended option to use in production.

  4. The next replica set configuration option is called priority. It determines the eligibility of a replica set member to become a primary (or not to become a primary). The option is set as follows:
    {_id: ... , 'priority': <priority number>}
    
  5. A higher number indicates more likelihood of becoming a primary. The primary will always be the one with the highest priority among the members alive in a replica set. Setting this option in an already configured replica set will trigger an election.
  6. Setting the priority option to 0 will ensure that a member will never become a primary.
  7. The next option we look at is hidden. Setting the value of this option to true ensures that the replica set member is hidden. The option is set as follows:
    {_id: ... , 'hidden': <true/false>}
    

    One thing to keep in mind is that, when a replica set member is hidden, its priority too should be made 0 to ensure it doesn't become primary. Though this seems redundant, as of the current version, the value or priority needs to be set explicitly.

  8. When a programming language client connects to a replica set, it will not be able to discover hidden members. However, after executing rs.status() from the shell, the member's status would be visible.
  9. The next option we will look at is the slaveDelay option. This option is used to set the lag in time for the slave from the primary of the replica set. The option is set as follows:
    {_id: ... , 'slaveDelay': <number of seconds to lag>}
    
  10. Like the hidden member, slave delayed members too should have the priority option set to 0 to ensure they don't ever become primary. This needs to be set explicitly.
  11. The final configuration option we will be looking at is buildIndexes. This value if not specified. By default, the value is true, which indicates that if an index is created on the primary, it needs to be replicated on the secondary too. The option is set as follows:
    {_id: ... , 'buildIndexes': <true/false>}
    
  12. If the value of buildIndexes is set to false, the priority is set to 0 to ensure they don't ever become primary. This needs to be set explicitly. Also, this option cannot be set after the replica set is initiated. Just like an arbiter node, this needs to be set when the replica set is being created or when a new member node is being added to the replica set.

How it works…

In this section, we will explain and understand the significance of different types of members and the configuration options we saw in the previous section.

A replica set member as an arbiter

The English meaning of the word ''arbiter'' is a judge who resolves a dispute. In the case of replica sets, the arbiter node is present just to vote in the case of elections and not to replicate any data. This is, in fact, a pretty common scenario due to the fact that that a Mongo replica set needs to have at least three instances (and preferably an odd number of instances, three or more). A lot of applications do not need to maintain three copies of data and are happy with just two instances, one primary and a secondary with the data.

Consider the scenario where only two instances are present in the replica set. When the primary goes down, the secondary instance cannot form a proper majority because it only has 50 percent of the votes (its own votes) and thus, it cannot become a primary. If a majority of the secondary instances go down, then the primary instance steps down from the primary and becomes a secondary, thus making the replica set unavailable for writes. Thus, a two-node replica set is useless, as it doesn't stay available even when any of the instances go down. It defeats the purpose of setting up a replica set and thus, a minimum of three instances are needed in a replica set.

Arbiters come in handy in such scenarios. We set up a replica set instance with three instances, with only two having data and one acting as an arbiter. We need not maintain three copies of data at the same time; we eliminate the problem we face, by setting up a two-instance replica set.

Priority of replica set members

This is an option whose use is enforced by other options as well, though it can be used on its own in some cases. The options that enforce its usage are hidden, slaveDelay, and buildIndexes, where we don't want the member with one of these three options to ever be made primary. We will look at these options soon.

Some more possible use cases, where we never want a replica set to become a primary, are as follows:

  • When the hardware configuration of a member is not able to deal with the write and read requests, should it become a primary; and the only reason it is being put in there is for replicating the data.
  • We have a multi data center setup, where one replica set instance is present in another data center for the sake of geographically distributing the data for disaster recovery purposes. Ideally, the network latency between the application server hosting the application and the database should be minimal for optimum performance. This can be achieved if both the servers (the application server and database server) are in the same data center. Not changing the priority of the replica set instance in another data center makes it equally eligible for being chosen as a primary, thus compromising the application's performance if the server from another data center gets chosen as the primary. In such scenarios, we can set the priority to be 0 for the server in the second data center, and a manual cutover will be needed by the administrator to fail over to another data center, should an emergency arise.

In both these scenarios, we can also have the respective members hidden so that the application client doesn't have a view of these members in the first place.

Just as we set the priority to 0 to not allow one to be the primary, we can also be biased towards one member being the primary, whenever it is available, by setting its priority to a value greater than one, because the default value of the priority field is 1.

Suppose we have a scenario where, for budget reasons, we have one of the members storing data on SSDs and the remaining data on spinning disks. We will ideally want the member with SSDs to be the primary, whenever the primary server is up and running. It is only when it is not available that we will want another member to become a primary. In such scenarios, we can set the priority of the member running on SSD to a value greater than 1. The value doesn't really matter as long as it is greater than the rest; that is, setting it to 1.5 or 2 makes no difference as long as the priority of the other members is less.

Hidden, votes, slave delayed, and build index configurations

The term hidden for a replica set node is for an application client that is connected to the replica set and not for an administrator. For an administrator, it is equally important for the hidden members to be monitored and thus, their state is seen in the rs.status() response. Hidden members participate in elections too, just like all other members.

Though votes is an option that is not a recommended solution to a problem, there is an interesting behavior that needs to be mentioned. Suppose you have a three-member replica set. With each instance of the replica set having one vote by default, we have a total of three votes in the replica set. For a replica set to allow writes, a majority of voting members should be up. However, the calculation of a majority doesn't happen using the number of members up but by the total number of votes. Let us see how.

By default, with one vote each, if one of the members is down, we have two out of a total of three votes available, and thus, the replica set continues to operate. However, if we have one member with the number of votes set to 2, we now have a total of four votes (1 + 1 + 2) in the replica set. If this member goes down, even though it is secondary, the primary will automatically step down, and the replica set will be left with no primary, thus not allowing writes. This happens because two out of four possible votes are now gone and we no longer have a majority of the votes available. If this member with two votes is a primary, then again no majority can be formed as there are just a maximum of two votes out of four available, and a primary won't be elected. Thus in general, as a rule of thumb, if you are tempted to use this votes configuration option for your use case, think again, as you may very well use other options such as priority and arbiterOnly to address these use cases.

From Version 2.6 of MongoDB, the votes option is deprecated, and the following message gets printed in the logs:

[rsMgr]   WARNING: Having more than 1 vote on a single replicaset member is
[rsMgr]   deprecated, as it causes issues with majority write concern. For
[rsMgr]   more information, see http://dochub.mongodb.org/core/replica-set-votes-deprecated

Thus, it is recommended not to use this option and prefer an alternative configuration option; in some future version of MongoDB, it might not even be supported.

For the slaveDelay option, the most common use case is to ensure that the data in a member at a particular point of time lags behind the primary by the provided number of seconds. It can be restored if some unforeseen error happens, say, a human erroneously updating some data. Remember, the longer the time delay, the longer the time we get to recover, but at the cost of possibly stale data.

Finally, we'll see the buildIndexes option. This is useful in cases where we have a replica set member with nonproduction standard hardware and the cost of maintaining the indexes is not worth it. You may choose to set this option for members where no queries are executed on them. Obviously, if you set this option, they can never become primary members and thus, the priority option is enforced to be set to 0.

There's more…

You can achieve some interesting things using tags in replica sets. This will be discussed in a later recipe after we learn about tags in the Building tagged replica sets recipe.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset