In the Starting multiple instances as part of a replica set recipe in Chapter 1, Installing and Starting the MongoDB Server, we saw how to set up a simple replica set and what the purpose of a replica set is. We also have a good deal of explanation in Appendix, Concepts for Reference, on what write concern is and why it is used. What we saw about write concerns is that they offer a minimum level guarantee for a certain write operation. However, with the concept of tags and write concerns, we can define a variety of rules and conditions that must be satisfied before a write operation is deemed successful and a response is sent to the user.
Consider some common use cases:
The preceding use cases are a few of the common use cases that arise and are not addressed using simple write concerns that we have seen earlier. We need a different mechanism to cater to these requirements; replica sets with tags are what we need.
Obviously the next question is, What exactly are tags? Let us take an example of a blog. Various posts in the blog have different tags attached to them. These tags allow us to easily search, group, and relate posts together. Tags are user-defined texts with some meaning attached to it. If we draw an analogy between a blog post and the replica set members, just as we attach tags to a post, we can attach tags to each replica set member. For example, in a multi-data center scenario with two replica set members in data center 1 (dc1
) and one member in data center 2 (dc2
), we can have the following tags assigned to the members. The name of the key and the value assigned to the tag are arbitrary, and they are chosen during the designing of the application.
You may even choose to assign any tags, for example, to the administrator who set up the server, if you really find it useful to address your use case.
Replica set member |
Tag |
---|---|
Replica set member 1 |
|
Replica set member 2 |
|
Replica set member 3 |
|
This is good enough to lay the foundation of what replica set tags are. In this recipe, we will see how to assign tags to replica set members and, more importantly, how to make use of them to address some of the sample use cases we saw earlier.
Refer to the Starting multiple instances as part of a replica set recipe in Chapter 1, Installing and Starting the MongoDB Server, for the prerequisites and to know about replica set basics. Go ahead and set up a simple three-node replica set on your computer, as mentioned in the recipe. Open a shell and connect to the primary member of the replica set.
If you need to know about write concerns, refer to the overview of write concerns Appendix, Concepts for Reference.
For the purpose of inserting into the database, we will use Python, as it gives us an interactive interface such as the Mongo shell. Refer to the Installing PyMongo recipe in Chapter 3, Programming Language Drivers, for steps on how to install PyMongo. The Mongo shell would have been the most ideal candidate for the demonstration of the insert
operations, but there are certain limitations around the usage of the shell with our custom write concern. Technically, any programming language with the write concerns mentioned in the recipe for insert
operations will work fine.
> var conf = rs.conf() > conf[0].members.tags = {'datacentre': 'dc1', 'rack': 'rack-dc1-1'} > conf[1].members.tags = {'datacentre': 'dc1', 'rack': 'rack-dc1-2'} > conf[2].members.priority = 0 > conf[2].members.tags = {'datacentre': 'dc2', 'rack': 'rack-dc2-1'}
> conf.settings = {'getLastErrorModes' : {'MultiDC':{datacentre : 2}}} > rs.reconfig(conf)
>>> import pymongo >>> client = pymongo.MongoReplicaSetClient('localhost:27000,localhost:27001', replicaSet='replSetTest') >>> db = client.test
insert
query:>>> db.multiDCTest.insert({'i':1}, w='MultiDC', wtimeout=5000)
insert
query goes through successfully, and ObjectId
will be printed out. You may query the collection to confirm from either the Mongo shell or the Python shell.27002
, which is the one with priority 0
and tagged to be in a different data center.rs.status()
helper function from the Mongo shell), execute the following insert
query again; this insert should throw an error for timeout:>>> db.multiDCTest.insert({'i':2}, w='MultiDC', wtimeout=5000)
{'MultiRack':{rack : 2}}
conf
object will then be as follows. Once set, reconfigure the replica set again using rs.reconfig(conf)
from the Mongo shell as follows:{ 'getLastErrorModes' : { 'MultiDC':{datacentre : 2}, 'MultiRack':{rack : 2} } }
We saw WriteConcern
used with replica set tags to achieve functionality such as data center and rack awareness. Let us see how we can use replica set tags with read operations.
> var conf = rs.conf() > conf.members[2].tags.type = 'reports' > rs.reconfig(conf)
0
and 1
in a different data center with an additional tag called type
with the value reports
.>>> curs = db.multiDCTest.find(read_preference=pymongo.ReadPreference.SECONDARY, tag_sets=[{'type':'reports'}]) >>> curs.next()
27002
, and execute the following command on the Python shell again:>>> curs = db.multiDCTest.find(read_preference=pymongo.ReadPreference.SECONDARY, tag_sets=[{'type':'reports'}]) >>> curs.next()
This time around, the execution should fail and state that no secondary was found with the required tag sets.
In this recipe, we did a lot of operations on tagged replica sets and saw how they can affect write operations using WriteConcern
and read operations using ReadPreference
. Let us look at them in some detail now.
We set up a replica set that was up and running, which we reconfigured to add tags. We tagged the first two servers in data center 1 and in different racks (with the servers running and listening to ports 27000
and 27001
for client connections), and the third one in data center 2 (with the server listening to port 27002
for client connections). We also ensured that the member in data center 2 doesn't become a primary by setting its priority to 0.
Our first objective is to ensure that write operations to the replica set get replicated to at least one member in the two data centers. To ensure this, we define a write concern as follows:
{'MultiDC':{datacentre : 2}}
Here, we first define the name of the write concern as MultiDC
. The value, which is a JSON object, has one key with the name datacenter
, which is the same as the key used for the tag we attached to the replica set, and the value is the number 2
, which will be looked at as the number of distinct values of the given tag that should acknowledge the write before it is deemed successful.
For instance, in our case, when the write comes to server 1 in data center 1, the number of distinct values of the datacentre
tag is 1
. If the write operation gets replicated to the second server, the number still stays 1
, as the value of the datacentre
tag is the same as the first member. It is only when the third server acknowledges the write operation that the write satisfies the defined condition of replicating the write to two distinct values of the datacentre
tag in the replica set. Note that the value can only be a number and can not have something such as {datacentre : 'dc1'}
. This definition is invalid and an error will be thrown while reconfiguring the replica set.
However, we need to register this write concern somewhere with the server. This is done in the final step of the configuration by setting the settings
value in configuration JSON. The value to set is getLastErrorModes
. The value of getLastErrorModes
is a JSON document with all possible write concerns defined in it. We later define one more write concern for writes propagated to at least two racks. This is conceptually in line with the MultiDC
write concern and thus, we will not be discussing it in detail here. After setting all the required tags and settings, we reconfigure the replica set for the changes to take effect.
Once reconfigured, we perform some write operations using the MultiDC
write concern. When two members in two distinct data centers are available, the write goes through successfully. However, when the server in the second data center goes down, the write operation times out and throws an exception to the client initiating the write. This demonstrates that the write operation will succeed or fail as per how we intended.
We just saw how these custom tags can be used to address some interesting use cases that are not supported by the product implicitly, as far as write operations are concerned. Similar to write operations, read operations can take full advantage of these tags to address some use cases, such as reading from a fixed set of secondary members that are tagged with a particular value.
We added another custom tag annotating a member to be used for reporting purposes. We then fired a query operation with the read preference to query a secondary and provided the tag sets that should be looked for before considering the member as a candidate for a read operation. Remember that when using a primary as the read preference, we cannot use tags, and that is the reason we explicitly specified the value of read_preference
to SECONDARY
.