Manual split and migration of chunks

Though MongoDB does a good job of splitting and migrating chunks across shards to maintain the balance, under some circumstances such as a small number of documents or relatively large number of small documents where the automatic balancer doesn't split the collection, an administrator might want to split and migrate the chunks manually. In this recipe, we will see how to split and migrate the collection manually across shards. For this recipe, we will set up a simple shard as we saw in Chapter 1, Installing and Starting the Server.

Getting ready

Refer to the recipe Starting a simple sharded environment of two shards in Chapter 1, Installing and Starting the Server to set up and start a sharded environment. It is preferred to start a clean environment without any data in it. From the shell, connect to the started mongos process.

How to do it…

  1. Connect to the mongos process from the mongo shell and enable sharding on the test database and the splitAndMoveTest collection as follows:
    > sh.enableSharding('test')
    > sh.shardCollection('test.splitAndMoveTest', {_id:1}, false)
    
  2. Let's load the data in the collection as follows:
    > for(i = 1; i <= 10000 ; i++) db.splitAndMoveTest.insert({_id : i})
    
  3. Once the data is loaded, execute the following:
    > db. splitAndMoveTest.find().explain()
    

    Note the number of documents in two shards in the plan. The value to lookout for is in the two documents under the shards key in the result of explain plan. Within these two documents the field to lookout for is n.

  4. Execute the following to see the splits of the collection:
    > config = db.getSisterDB('config')
    > config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
    
  5. Split the chunk into two at 5000 as follows:
    > sh.splitAt('test.splitAndMoveTest', {_id:5000})
    
  6. Splitting it doesn't migrate it to the second server. See what exactly happened with the chunks by executing the following query again:
    > config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
    
  7. We will now move the second chunk to the second shard:
    > sh.moveChunk('test.splitAndMoveTest', {_id:5001}, 'shard0001')
    
  8. Execute the following query again and confirm the migration:
    > config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
    
  9. Alternatively, the following explain plan will show a split of about 50-50:
    > db. splitAndMoveTest.find().explain()
    

How it works…

We simulate a small data load by adding monotonically increasing numbers and discover that the numbers are not split across two shards evenly by viewing the query plan. It is not a problem as the chunk size needs to reach a particular threshold, 64 MB by default, before the balancer decides to migrate the chunks across the shards to maintain balance. This is pretty perfect as in real world, when the data size gets huge we will see that eventually over a period of time the shards are well balanced.

However, if the administration does decide to split and migrate the chunks, it is possible to do it manually. The two helper functions sh.splitAt and sh.moveChunk are there to do this work. Let's look at their signatures and see what they do.

The function sh.splitAt takes two arguments, first is the namespace, which has the format <database>.<collection name> and the second parameter is the query that acts as the split point to split the chunk into two, possibly two uneven portions depending on where the given document is in the chunk. There is another method, sh.splitFind, which will try and split the chunk in two equal portions.

Splitting doesn't mean the chunk moves to another shard, it just breaks one big chunk into two, but the data stays on the same shard. It is an inexpensive operation which involves updating the config DB.

Next, we executed was to migrate the chunk to a different shard after we split it into two. The operation sh.MoveChunk is used just to do that. This function takes three parameters, first one is again the namespace of the collection that has the format <database>.<collection name>, second parameter is a query a document whose chunk would be migrated, and the third parameter is the destination chunk.

Once the migration is done, the query's plan shows us that the data is split in two chunks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset