In the previous recipe, we started a replica set of three mongod
processes. In this recipe, we will be working on top of it and will connect to it from the client application, perform querying, insert data, and take a look at some of the interesting aspects of the replica set from a client's perspective.
The prerequisite for this recipe is that the replica set should be set up, and it should be up and running. For details on how to start the replica set, refer to the Starting multiple instances as part of a replica set recipe.
Let's take a look at the steps in detail:
/data/n1
, /data/n2
, /data/n3
, and /logs
directories for data and logs of the three nodes, respectively.mongo localhost:27000
:
and then followed by the server's state. In this case, if the replica set is initialized and is up and running, we will see either repSetTest:PRIMARY>
or repSetTest:SECONDARY>
.rs.status()
command in the shell and look out for the stateStr
field. This should give us the primary server. Use the Mongo shell to connect to this server. At this point, we should have two shells running: one connected to a primary node and the other connected to a secondary node.repSetTest:PRIMARY> db.replTest.insert({_id:1, value:'abc'})
There is nothing special about it. We have just inserted a small document in a collection that we use for the replication test.
repSetTest:PRIMARY> db.replTest.findOne() { "_id" : 1, "value" : "abc" }
repSetTest:SECONDARY> db.replTest.findOne()
{ "$err" : "not master and slaveOk=false", "code" : 13435 }
repSetTest:SECONDARY> rs.slaveOk(true)
repSetTest:SECONDARY>db.replTest.findOne() { "_id" : 1, "value" : "abc" }
repSetTest:SECONDARY> db.replTest.insert({_id:1, value:'abc'}) not master
We have done a lot of things in this recipe, and we will try to throw some light on some of the important concepts to remember.
We basically connected to a primary and a secondary node from the shell and performed (I would say, tried to perform) the select and insert operations. The architecture of a Mongo replica set is made up of one primary (just one; no more, no less) and multiple secondary nodes. All writes happen on the primary node only. Note that replication is not a mechanism to distribute a read-request load that enables us to scale the system. Its primary intent is to ensure high availability of data. By default we are not permitted to read data from the secondary nodes. In step 6, we simply inserted data from the primary node and then executed the query to get the document that we inserted. This is straightforward, and there is nothing related to clustering here. Just note that we inserted the document from the primary node and then queried it back.
In the next step, we executed the same query but, this time, from the secondary node's shell. By default, querying is not enabled on the secondary node. There might be a small lag in replicating the data, possibly due to heavy data volumes to be replicated, network latency, and hardware capacity to name a few of the causes; thus, querying on the secondary node might not reflect the latest inserts or updates made on the primary node. If, however, we are OK with it and can live with the slight lag in the data being replicated, all we need to do is enable querying on the secondary node explicitly by just executing one command, rs.slaveOk()
or rs.slaveOk(true)
. Once this is done, we are free to execute queries on the secondary nodes too.
Finally, we tried to insert data in a collection of the slave node. Under no circumstances this is permitted, regardless of whether we have executed rs.slaveOk()
. When rs.slaveOk()
is invoked, it just permits the data to be queried from the secondary node. All the write operations still have to go to the primary node and then flow down to the secondary node. The internals of replication will be covered in a different recipe in the Understanding and analyzing oplogs recipe in Chapter 4, Administration.