Exploring the config database in a sharded setup

Config database is the backbone of a sharded setup in Mongo. It stores all the metadata of the shard setup and has a dedicated mongod process running for it. When a mongos process is started we provide it with the config servers' URL. In this recipe, we will take a look at some collections in the config database and dive deep into their content and significance.

Getting ready

We need a sharded setup for this recipe. Refer to the recipe Starting a simple sharded environment of two shard in Chapter 1, Installing and Starting the Server for information on how to start a simple shard. Additionally, connect to the mongos process from a shell.

How to do it…

  1. From the console connected to the mongos process, switch to the config database and execute the following:
    mongos> use config
    mongos>show collections
    
  2. From the list of all collections, we will visit a few. We start with the databases collection. This keeps a track of all the databases on this shard. Execute the following from the shell:
    mongos> db.databases.find()
    
  3. The content of the result is pretty straightforward, the value of the field _id is for the database. The value of field partitioned tells us whether sharding is enabled for the database or not; true indicates it is enabled and the field primary gives the primary shard where the data of non-sharded collections reside upon.
  4. Next, we will visit the collections collection. Execute the following from the shell:
    mongos> db.collections.find().pretty()
    

    This collection, unlike the databases collection we saw earlier, contains only those collections for which we have enabled sharding. The field _id gives the namespace of the collection in the <database>.<collection name> format, the field key gives the shard key and the field unique, indicates whether the shard key is unique or not. These three fields come as the three parameters of the sh.shardCollection function in that very order.

  5. Next, we look at the chunks collection. Execute the following on the shell. If the database was clean when we started this recipe, we won't have a lot of data in this:
    mongos> db.chunks.find().pretty()
    
  6. We then look at the tags collection and execute the following query:
    mongos> db.tags.find().pretty()
    
  7. Let's query the mongos collection as follows.
    mongos> db.mongos.find().pretty()
    

    This is a simple collection that gives the list of all mongos instances connected to the shard with the details like the host and port on which the mongos instance is running, which forms the _id field. The version and figures like for how much time the process is up and running in seconds.

  8. Finally, we look at the version collection. Execute the following query. Note that is not similar to other queries we execute:
    mongos>db.getCollection('version').findOne()
    

How it works…

We saw the collections and databases collection while we queried them and they are pretty simple. Let's look at the collection called chunks. Here is a sample document from this collection:

{
        "_id" : "test.userAddress-pincode_400001.0",
        "lastmod" : Timestamp(1, 3),
        "lastmodEpoch" : ObjectId("53026514c902396300fd4812"),
        "ns" : "test.userAddress",
        "min" : {
                "pincode" : 400001
        },
        "max" : {
                "pincode" : 411001
        },
        "shard" : "shard0000"
}

The fields of interest are ns, min, max, and shard, which are the namespace of the collection, the minimum value present in the chunk, the maximum value present in the chunk, and the shard on which this chunk lies, respectively. The value of the chunk size is 64 MB by default. This can be seen in the settings collection. Execute db.settings.find() from the shell and look at the value of the field value, which is the size of the chunk in MB. Chunks are restricted to this small size to ease the migration process across shards, if needed. When the size of the chunk exceeds this threshold, mongo server finds a suitable point in the existing chunk to break it into two and adds a new entry in this chunks collection. This operation is called splitting, which is inexpensive as the data stays where it is; it is just logically split into multiple chunks. The balancer on mongo tries to keep the chunks across shards balanced and the moment it sees some imbalance, it migrates these chunks to a different shard. This is expensive and also depends largely on the network bandwidth. If we use sh.status(), the implementation actually queries the collections we saw and prints the pretty formatted result.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset