Though MongoDB does a good job by default of splitting and migrating chunks across shards to maintain the balance, under some circumstances, such as a small number of documents or a relatively large number of small documents, where the automatic balancer doesn't split the collection, an administrator might want to split and migrate the chunks manually. In this recipe, we will see how to split and migrate the collection manually across shards. Again, for this recipe, we will set up a simple shard as we saw in Chapter 1, Installing and Starting the MongoDB Server.
Refer to the Starting a simple sharded environment of two shards recipe in Chapter 1, Installing and Starting the MongoDB Server, to set up and start a sharded environment. It is preferred to start a clean environment without any data in it. From the shell, connect to the started mongos
process.
mongos
process from the Mongo shell and enable sharding on the test
database and the splitAndMoveTest
collection as follows:> sh.enableSharding('test') > sh.shardCollection('test.splitAndMoveTest', {_id:1}, false)
> for(i = 1; i <= 10000 ; i++) db.splitAndMoveTest.insert({_id : i})
> db. splitAndMoveTest.find().explain()
Note that the number of documents in the two shards in the plan. The value to lookout for in the two documents under the shards key is the result of the explain plan. Within these two documents, the field to look out for is n
.
> config = db.getSisterDB('config') > config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
5000
, as follows:> sh.splitAt('test.splitAndMoveTest', {_id:5000})
> config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
> sh.moveChunk('test.splitAndMoveTest', {_id:5001}, 'shard0001')
> config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
> db. splitAndMoveTest.find().explain()
We simulate a small data load by adding monotonically increasing numbers and discover that the numbers are not split across two shards evenly, by viewing the query plan. This is not a problem, as the chunk size needs to reach a particular threshold, 64 MB by default, before the balancer decides to migrate the chunks across the shards to maintain balance. This is pretty perfect because, in the real world, when the data size gets huge, we will see that eventually, over a period of time, the shards are well balanced.
However, under some circumstances, when the administration decides to split and migrate the chunks; it is possible to do it manually. The two helper functions sh.splitAt
and sh.moveChunk
are there to do this work. Let us look at their signatures and see what they do.
The sh.splitAt
function takes two parameters. The first parameter is the namespace, which has the format <database>.<collection name>
and the second parameter is the query that acts as the split point to split the chunk into two, possibly two uneven, portions, depending on where the given document is in the chunk. There is another method named sh.splitFind
that will try and split the chunk in two equal portions.
However, splitting doesn't mean the chunk moves to another shard. It just breaks one big chunk into two, but the data stays on the same shard. It is an inexpensive operation that involves updating the config DB.
The next operation we execute is to migrate the chunk to a different shard after we split it into two. The sh.MoveChunk
operation is used just to do that. This function takes three parameters. The first one is again the namespace of the collection, which has the format <database>.<collection name>
; the second parameter is a query a document, whose chunk would be migrated; and the third parameter is the destination chunk.
Once the migration is done, the query's plan shows us that the data is split into two chunks.