We have had a good discussion on what a replica set is and how to start a simple replica set, in the Starting multiple instances as part of a replica set recipe in Chapter 1, Installing and Starting the MongoDB Server. In the Understanding interprocess security in MongoDB recipe, we saw how to start a replica set with interprocess authentication. To be honest, that is pretty much what we do while setting up a standard replica set. However, there are a few configurations that one must know; one must also be aware of how they affect the replica set's behavior. Note that we are still not discussing tag aware replication in this recipe; it will be taken up later in this chapter as a separate recipe Building tagged replica sets.
Refer to the recipe Starting multiple instances as part of a replica set in Chapter 1, Installing and Starting the MongoDB Server, for the prerequisites and to know about the replica set basics. Go ahead and set up a simple three-node replica set on your computer, as mentioned in the recipe.
Before we go ahead with the configurations, we will see what elections are in a replica set and how they work from a high level. It is good to know about elections because some of the configuration options affect the voting process in the elections.
A Mongo replica set has one primary instance and multiple secondary instances. All writes happen only through the primary instance and are replicated to the secondary instances. Read operations can happen from secondary instances, depending on the read preference. Refer to the Read preference for querying section in Appendix, Concepts for Reference, to know what read preference is. However, if the primary goes down or is not reachable for some reason, the replica set becomes unavailable for writes. A Mongo replica set has a feature to automatically failover to a secondary, by promoting it to a primary and making the set available to clients for both read and write operations. The replica set remains unavailable for that brief moment till a new primary comes up.
All this sounds good, but the question is, who decides what the new primary instance will be? The process of choosing a new primary happens through an election. Whenever any secondary detects that it cannot reach out to a primary, it asks all the replica set nodes in the instance to elect themselves as the new primary.
All other nodes in the replica set that receive this request for the election of the primary will perform certain checks before they vote a yes to the secondary requesting an election. Let's take a look at the steps:
The preceding checks are pretty much what will be happening (not necessarily in the order mentioned here) during the re-election. If these checks pass, the instance votes a yes.
The election is void if even a single instance votes no. However, if none of the instances have voted no, then the secondary that requests the election will become a new primary if it receives a yes from the majority of instances. If the election becomes void, there will be a re-election with the same secondary or any other instance requesting an election with the preceding mentioned process, till a new primary is elected.
Now that we have an idea about elections in a replica set and the terminologies, let us look at some replica set configurations. A few of these options are related to votes, and we start by looking at these options first.
From Chapter 1, Installing and Starting the MongoDB Server, when we set up a replica set, we have a configuration similar to the following one. The basic replica set configuration for a three-member set is as follows:
{ "_id" : "replSet", "members" : [ { "_id" : 0, "host" : "Amol-PC:27000" }, { "_id" : 1, "host" : "Amol-PC:27001" }, { "_id" : 2, "host" : "Amol-PC:27002" } ] }
We will not be repeating the entire configuration in the steps in the following sections. All the flags we mention will be added to the document of a particular member in the members
array. For example, in the preceding example, if a node with _id
as 2
is to be made an arbiter, we will have the following configuration for it in the configuration document shown earlier:
{ "_id" : 2, "host" : "Amol-PC:27002" "arbiterOnly" : true }
Generally, the steps to reconfigure a replica set that has already been set up are as follows:
rs.conf()
call from the shell as follows:> var conf = rs.conf()
members
field in the document is an array of documents for each individual member of a replica set. To add a new property to a particular member, we need to execute the following command. For instance, if we want to add the votes
key and set its value to 2
for the third member of the replica set (index 2
in the array), we execute the following command:> conf.members[2].votes = 2
> rs.reconfig(conf)
> rs.initiate (conf)
For all the steps given in the next section, you need to follow the preceding steps to reconfigure or initiate the replica set, unless some other steps are mentioned explicitly.
In this recipe, we will look at some of the possible configurations that can be used in a replica set. The explanation here will be minimal with all the explanations done as usual in the next section.
{_id: ... , 'arbiterOnly': true }
rs.addArb(<hostname>:<port>)
. For example, to add an arbiter listening to port 27004
to an existing replica set, the following command was executed on my machine:> rs.addArb('Amol-PC:27004')
When the server starts to listen to port 27004
, and rs.status()
is executed from the Mongo shell, we see that state
and strState
for this member are 7
and ARBITER
respectively.
votes
, affects the number of votes a member gets in the election. By default, all members get one vote each. This option can be used to change the number of votes a particular member gets. It can be set as follows:{_id: ... , 'votes': <number of votes>}
The votes of existing members of a replica set can be changed and the replica set can be reconfigured using rs.reconfig()
.
Though the option votes is available, which can potentially change the number of votes to form a majority, it usually doesn't add much value and is not a recommended option to use in production.
priority
. It determines the eligibility of a replica set member to become a primary (or not to become a primary). The option is set as follows:{_id: ... , 'priority': <priority number>}
priority
option to 0
will ensure that a member will never become a primary.hidden
. Setting the value of this option to true
ensures that the replica set member is hidden. The option is set as follows:{_id: ... , 'hidden': <true/false>}
One thing to keep in mind is that, when a replica set member is hidden, its priority too should be made 0
to ensure it doesn't become primary. Though this seems redundant, as of the current version, the value or priority needs to be set explicitly.
rs.status()
from the shell, the member's status would be visible.slaveDelay
option. This option is used to set the lag in time for the slave from the primary of the replica set. The option is set as follows:{_id: ... , 'slaveDelay': <number of seconds to lag>}
priority
option set to 0
to ensure they don't ever become primary. This needs to be set explicitly.buildIndexes
. This value if not specified. By default, the value is true
, which indicates that if an index is created on the primary, it needs to be replicated on the secondary too. The option is set as follows:{_id: ... , 'buildIndexes': <true/false>}
buildIndexes
is set to false
, the priority is set to 0
to ensure they don't ever become primary. This needs to be set explicitly. Also, this option cannot be set after the replica set is initiated. Just like an arbiter node, this needs to be set when the replica set is being created or when a new member node is being added to the replica set.In this section, we will explain and understand the significance of different types of members and the configuration options we saw in the previous section.
The English meaning of the word ''arbiter'' is a judge who resolves a dispute. In the case of replica sets, the arbiter node is present just to vote in the case of elections and not to replicate any data. This is, in fact, a pretty common scenario due to the fact that that a Mongo replica set needs to have at least three instances (and preferably an odd number of instances, three or more). A lot of applications do not need to maintain three copies of data and are happy with just two instances, one primary and a secondary with the data.
Consider the scenario where only two instances are present in the replica set. When the primary goes down, the secondary instance cannot form a proper majority because it only has 50 percent of the votes (its own votes) and thus, it cannot become a primary. If a majority of the secondary instances go down, then the primary instance steps down from the primary and becomes a secondary, thus making the replica set unavailable for writes. Thus, a two-node replica set is useless, as it doesn't stay available even when any of the instances go down. It defeats the purpose of setting up a replica set and thus, a minimum of three instances are needed in a replica set.
Arbiters come in handy in such scenarios. We set up a replica set instance with three instances, with only two having data and one acting as an arbiter. We need not maintain three copies of data at the same time; we eliminate the problem we face, by setting up a two-instance replica set.
This is an option whose use is enforced by other options as well, though it can be used on its own in some cases. The options that enforce its usage are hidden
, slaveDelay
, and buildIndexes
, where we don't want the member with one of these three options to ever be made primary. We will look at these options soon.
Some more possible use cases, where we never want a replica set to become a primary, are as follows:
0
for the server in the second data center, and a manual cutover will be needed by the administrator to fail over to another data center, should an emergency arise.In both these scenarios, we can also have the respective members hidden so that the application client doesn't have a view of these members in the first place.
Just as we set the priority to 0
to not allow one to be the primary, we can also be biased towards one member being the primary, whenever it is available, by setting its priority to a value greater than one, because the default value of the priority
field is 1
.
Suppose we have a scenario where, for budget reasons, we have one of the members storing data on SSDs and the remaining data on spinning disks. We will ideally want the member with SSDs to be the primary, whenever the primary server is up and running. It is only when it is not available that we will want another member to become a primary. In such scenarios, we can set the priority of the member running on SSD to a value greater than 1. The value doesn't really matter as long as it is greater than the rest; that is, setting it to 1.5 or 2 makes no difference as long as the priority of the other members is less.
The term hidden for a replica set node is for an application client that is connected to the replica set and not for an administrator. For an administrator, it is equally important for the hidden members to be monitored and thus, their state is seen in the rs.status()
response. Hidden members participate in elections too, just like all other members.
Though votes
is an option that is not a recommended solution to a problem, there is an interesting behavior that needs to be mentioned. Suppose you have a three-member replica set. With each instance of the replica set having one vote by default, we have a total of three votes in the replica set. For a replica set to allow writes, a majority of voting members should be up. However, the calculation of a majority doesn't happen using the number of members up but by the total number of votes. Let us see how.
By default, with one vote each, if one of the members is down, we have two out of a total of three votes available, and thus, the replica set continues to operate. However, if we have one member with the number of votes set to 2
, we now have a total of four votes (1 + 1 + 2) in the replica set. If this member goes down, even though it is secondary, the primary will automatically step down, and the replica set will be left with no primary, thus not allowing writes. This happens because two out of four possible votes are now gone and we no longer have a majority of the votes available. If this member with two votes is a primary, then again no majority can be formed as there are just a maximum of two votes out of four available, and a primary won't be elected. Thus in general, as a rule of thumb, if you are tempted to use this votes configuration option for your use case, think again, as you may very well use other options such as priority
and arbiterOnly
to address these use cases.
From Version 2.6 of MongoDB, the votes
option is deprecated, and the following message gets printed in the logs:
[rsMgr] WARNING: Having more than 1 vote on a single replicaset member is [rsMgr] deprecated, as it causes issues with majority write concern. For [rsMgr] more information, see http://dochub.mongodb.org/core/replica-set-votes-deprecated
Thus, it is recommended not to use this option and prefer an alternative configuration option; in some future version of MongoDB, it might not even be supported.
For the slaveDelay
option, the most common use case is to ensure that the data in a member at a particular point of time lags behind the primary by the provided number of seconds. It can be restored if some unforeseen error happens, say, a human erroneously updating some data. Remember, the longer the time delay, the longer the time we get to recover, but at the cost of possibly stale data.
Finally, we'll see the buildIndexes
option. This is useful in cases where we have a replica set member with nonproduction standard hardware and the cost of maintaining the indexes is not worth it. You may choose to set this option for members where no queries are executed on them. Obviously, if you set this option, they can never become primary members and thus, the priority
option is enforced to be set to 0
.