Manually splitting and migrating chunks

Though MongoDB does a good job by default of splitting and migrating chunks across shards to maintain the balance, under some circumstances, such as a small number of documents or a relatively large number of small documents, where the automatic balancer doesn't split the collection, an administrator might want to split and migrate the chunks manually. In this recipe, we will see how to split and migrate the collection manually across shards. Again, for this recipe, we will set up a simple shard as we saw in Chapter 1, Installing and Starting the MongoDB Server.

Getting ready

Refer to the Starting a simple sharded environment of two shards recipe in Chapter 1, Installing and Starting the MongoDB Server, to set up and start a sharded environment. It is preferred to start a clean environment without any data in it. From the shell, connect to the started mongos process.

How to do it…

  1. Connect to the mongos process from the Mongo shell and enable sharding on the test database and the splitAndMoveTest collection as follows:
    > sh.enableSharding('test')
    > sh.shardCollection('test.splitAndMoveTest', {_id:1}, false)
    
  2. Let us load the data in the collection as follows:
    > for(i = 1; i <= 10000 ; i++) db.splitAndMoveTest.insert({_id : i})
    
  3. Once the data is loaded, execute the following command:
    > db. splitAndMoveTest.find().explain()
    

    Note that the number of documents in the two shards in the plan. The value to lookout for in the two documents under the shards key is the result of the explain plan. Within these two documents, the field to look out for is n.

  4. Execute the following commands to see the splits of the collection:
    > config = db.getSisterDB('config')
    > config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
    
  5. Split the chunk into two at 5000, as follows:
    > sh.splitAt('test.splitAndMoveTest', {_id:5000})
    
  6. Splitting it doesn't migrate it to the second server. See exactly what happens with the chunks, by executing the following query again:
    > config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
    
  7. We will now move the second chunk to the second shard:
    > sh.moveChunk('test.splitAndMoveTest', {_id:5001}, 'shard0001')
    
  8. Execute the following query again and confirm the migration:
    > config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
    
  9. Alternatively, the following explain plan will show a split of about 50-50 percent:
    > db. splitAndMoveTest.find().explain()
    

How it works…

We simulate a small data load by adding monotonically increasing numbers and discover that the numbers are not split across two shards evenly, by viewing the query plan. This is not a problem, as the chunk size needs to reach a particular threshold, 64 MB by default, before the balancer decides to migrate the chunks across the shards to maintain balance. This is pretty perfect because, in the real world, when the data size gets huge, we will see that eventually, over a period of time, the shards are well balanced.

However, under some circumstances, when the administration decides to split and migrate the chunks; it is possible to do it manually. The two helper functions sh.splitAt and sh.moveChunk are there to do this work. Let us look at their signatures and see what they do.

The sh.splitAt function takes two parameters. The first parameter is the namespace, which has the format <database>.<collection name> and the second parameter is the query that acts as the split point to split the chunk into two, possibly two uneven, portions, depending on where the given document is in the chunk. There is another method named sh.splitFind that will try and split the chunk in two equal portions.

However, splitting doesn't mean the chunk moves to another shard. It just breaks one big chunk into two, but the data stays on the same shard. It is an inexpensive operation that involves updating the config DB.

The next operation we execute is to migrate the chunk to a different shard after we split it into two. The sh.MoveChunk operation is used just to do that. This function takes three parameters. The first one is again the namespace of the collection, which has the format <database>.<collection name>; the second parameter is a query a document, whose chunk would be migrated; and the third parameter is the destination chunk.

Once the migration is done, the query's plan shows us that the data is split into two chunks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset