Though MongoDB does a good job of splitting and migrating chunks across shards to maintain the balance, under some circumstances such as a small number of documents or relatively large number of small documents where the automatic balancer doesn't split the collection, an administrator might want to split and migrate the chunks manually. In this recipe, we will see how to split and migrate the collection manually across shards. For this recipe, we will set up a simple shard as we saw in Chapter 1, Installing and Starting the Server.
Refer to the recipe Starting a simple sharded environment of two shards in Chapter 1, Installing and Starting the Server to set up and start a sharded environment. It is preferred to start a clean environment without any data in it. From the shell, connect to the started mongos process.
test
database and the splitAndMoveTest
collection as follows:> sh.enableSharding('test') > sh.shardCollection('test.splitAndMoveTest', {_id:1}, false)
> for(i = 1; i <= 10000 ; i++) db.splitAndMoveTest.insert({_id : i})
> db. splitAndMoveTest.find().explain()
Note the number of documents in two shards in the plan. The value to lookout for is in the two documents under the shards key in the result of explain plan. Within these two documents the field to lookout for is n
.
> config = db.getSisterDB('config') > config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
5000
as follows:> sh.splitAt('test.splitAndMoveTest', {_id:5000})
> config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
> sh.moveChunk('test.splitAndMoveTest', {_id:5001}, 'shard0001')
> config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
> db. splitAndMoveTest.find().explain()
We simulate a small data load by adding monotonically increasing numbers and discover that the numbers are not split across two shards evenly by viewing the query plan. It is not a problem as the chunk size needs to reach a particular threshold, 64 MB by default, before the balancer decides to migrate the chunks across the shards to maintain balance. This is pretty perfect as in real world, when the data size gets huge we will see that eventually over a period of time the shards are well balanced.
However, if the administration does decide to split and migrate the chunks, it is possible to do it manually. The two helper functions sh.splitAt
and sh.moveChunk
are there to do this work. Let's look at their signatures and see what they do.
The function sh.splitAt
takes two arguments, first is the namespace, which has the format <database>.<collection name>
and the second parameter is the query that acts as the split point to split the chunk into two, possibly two uneven portions depending on where the given document is in the chunk. There is another method, sh.splitFind
, which will try and split the chunk in two equal portions.
Splitting doesn't mean the chunk moves to another shard, it just breaks one big chunk into two, but the data stays on the same shard. It is an inexpensive operation which involves updating the config DB.
Next, we executed was to migrate the chunk to a different shard after we split it into two. The operation sh.MoveChunk
is used just to do that. This function takes three parameters, first one is again the namespace of the collection that has the format <database>.<collection name>
, second parameter is a query a document whose chunk would be migrated, and the third parameter is the destination chunk.
Once the migration is done, the query's plan shows us that the data is split in two chunks.