In Chapter 1, Installing and Starting the Server, we saw how to set up a simple replica in Starting multiple instances as part of a replica set and saw what is the purpose of a replica set. We also have a good deal of explanation on what WriteConcern
is in the Appendix of the book and why it is used. What we saw about write concerns is that it offers a minimum level guarantee for a certain write operation. However, with the concept of tags and write concerns, we can define a variety of rules and conditions which must be satisfied before a write operation is deemed successful and a response is sent to the user.
Consider some common use cases such as the following:
The preceding use cases are a few of the common use cases that arise and are not addressed using simple write concerns that we have seen earlier. We need a different mechanism to cater to these requirements and replica sets with tags is what we need.
Obviously, the next question is what exactly are tags? Let's take an example of a blog. Various posts in the blog have different tags attached to them. These tags allow us to easily search, group, and relate posts together. Tags are some user defined text with some meaning attached to it. If we draw an analogy between the blog post and the replica set members, similar to how we attach tags to a post, we can attach tags to each replica set member. For example, in a multiple data center scenario with two replica set members in data center 1 (dc1
) and one member in data center 2 (dc2
), we can have the following tags assigned to the members. The name of the key and the value assigned to the tag is arbitrary and is chosen during design of the application; you may choose to even assign any tags like the administrator who set up the server if you really find it useful to address your use case:
Replica Set Member |
Tag |
---|---|
Replica set member 1 |
|
Replica set member 2 |
|
Replica set member 3 |
|
That is good enough to lay the foundation of what a replica set tags are. In this recipe, we will see how to assign tags to replica set members and more importantly, how to make use of them to address some of the sample use cases we saw earlier.
Refer to the recipe Starting multiple instances as part of a replica set from Chapter 1, Installing and Starting the Server for the prerequisites and know about the replica set basics. Go ahead and set up a simple three-node replica set on your computer, as mentioned in the recipe. Open a shell and connect to the primary member of the replica set.
If you need to know about write concerns, refer to the overview of write concerns in the Appendix of the book.
For inserting documents in the database, we will use Python as it gives us an interactive interface like the mongo shell. Refer to the recipe Connecting to a single node using a Python client in Chapter 1, Installing and Starting the Server for steps on how to install pymongo. Mongo shell would have been the most ideal candidate for the demonstration of the insert operations, but there are certain limitations around the usage of the shell with our custom write concern. Technically, any programming language with the write concerns mentioned in the recipe for insert operations would work fine.
> var conf = rs.conf() > conf.members[0].tags = {'datacentre': 'dc1', 'rack': 'rack-dc1-1'} > conf.members[1].tags = {'datacentre': 'dc1', 'rack': 'rack-dc1-2'} > conf.members[2].priority = 0 > conf.members[2].tags = {'datacentre': 'dc2', 'rack': 'rack-dc2-1'}
> conf.settings = {'getLastErrorModes' : {'MultiDC':{datacentre : 2}}} > rs.reconfig(conf)
>>> import pymongo >>> client = pymongo.MongoClient('localhost:27000,localhost:27001', replicaSet='replSetTest') >>> db = client.test
>>>db.multiDCTest.insert({'i':1}, w='MultiDC', wtimeout=5000)
ObjectId
would be printed out; you may query the collection to confirm from either the mongo shell or Python shell.1
, we will now stop the server listening to port 27002
, which is the one with priority 0
and tagged to be in a different data center.rs.status()
helper function from the mongo shell), execute the following insert again, this insert should error out:>>>db.multiDCTest.insert({'i':2}, w='MultiDC', wtimeout=5000)
{'MultiRack':{rack : 2}}
rs.reconfig(conf)
from the mongo shell:{ 'getLastErrorModes' : { 'MultiDC':{datacentre : 2}, 'MultiRack':{rack : 2} } }
WriteConcern
used with replica set tags to achieve some functionality like data center and rack awareness. Let's see how we can use replica set tags with read operations.> var conf = rs.conf() > conf.members[2].tags.type = 'reports' > rs.reconfig(conf)
0
and the one in a different data center with an additional tag called type with a value reports.>>> curs = db.multiDCTest.find(read_preference=pymongo.ReadPreference.SECONDARY, tag_sets=[{'type':'reports'}]) >>> curs.next()
27002
and execute the following on the python shell again:>>> curs = db.multiDCTest.find(read_preference=pymongo.ReadPreference.SECONDARY, tag_sets=[{'type':'reports'}]) >>> curs.next()
In this recipe, we did a lot of operations on tagged replica sets and saw how it can affect the write operations using WriteConcern
and read operations using ReadPreference
. Let's look at them in some details now.
We set up a replica set that was up and running, which we reconfigured to add tags. We tagged the first two servers in datacenter 1 and in different racks (servers running listening to port 27000
and 27001
for client connections) and the third one in datacenter 2 (server listening to port 27002
for client connections). We also ensured that the member in datacenter 2 doesn't become a primary by setting its priority to 0
.
Our first objective is to ensure that the write operations to the replica set gets replicated to at least one member in the two datacenters. To ensure this, we define a write concern as follows {'MultiDC':{datacentre : 2}}
. Here, we first define the name of the write concern as MultiDC. The value which is a JSON object has one key with name datacenter, which is same as the key used for the tag we attached to the replica set and the value is a number 2
, which will be looked as the number of distinct values of the given tag that should acknowledge the write before it is deemed successful.
For instance, in our case, when the write comes to server 1 in datacenter 1, the number of distinct values of the tag datacenter is 1. If the write operation gets replicated to the second server, the number still stays one as the value of the tag datacenter is same as the first member. It is only when the third server acknowledges the write operation, the write satisfies the defined condition of replicating the write to distinct two values of the tag datacenter in the replica set. Note that the value can only be a number and not have something like {datacentre : 'dc1'}
this definition is invalid and an error will be thrown while re-configuring the replica set.
But we need to register this write concern somewhere with the server. This is done in the final step of the configuration by setting the settings value in configuration JSON. The value to set is getLastErrorModes
. The value of getLastErrorModes
is a JSON document with all possible write concerns defined in it. We later defined one more write concern for write propagated to at least two racks. This is conceptually in line with MultiDC write concern and thus we will not be discussing it in details here. After setting all the required tags and the settings, we reconfigure the replica set for the changes to take effect.
Once reconfigured, we perform some write operations using the MultiDC write concern. When two members in two distinct datacenters are available, the write goes through successfully. However, when the server in second datacenter goes down, the write operation times out and throws an exception to the client initiating the write. This demonstrates that the write operation will succeed or fail as per how we intended.
We just saw how these custom tags can be used to address some interesting use cases, which are not supported by the product implicitly as far as write operations are concerned. Similar to write operations, read operations can take full advantages of these tags to address some use cases such as reading from a fixed set of secondary members that are tagged with a particular value.
We added another custom tag annotating a member to be used for reporting purposes, we then fire a query operation with the read preference to query a secondary and provide the tag sets that should be looked for before considering the member as a candidate for read operation. Remember that when using primary as the read preference, we cannot use tags and that is reason we explicitly specified the value of the read_preference
to SECONDARY
.