Performing domain-driven sharding using tags

The Starting a simple sharded environment of two shards and Connecting to a shard from the Mongo shell and performing operations recipes in Chapter 1, Installing and Starting the MongoDB Server, explained how to start a simple two-server shard and then insert data in a collection after choosing a shard key. The data that gets sharded is more technical, where the data chunk is kept to a manageable size by Mongo, by splitting it into multiple chunks and migrating the chunks across shards to keep the chunk distribution even across shards.However, what if we want the sharding to be more domain-oriented? Suppose we have a database for storing postal addresses and we shard based on postal codes, where we know the postal code range of a city. What we can do is tag the shard servers according to the city name as the tag, add the shard range (postal codes), and associate this range with the tag.

This way, we can state which servers can contain the postal addresses of which cities. For instance, we know that for Mumbai, being the most populous city, the number of addresses would be huge and thus we add two shards for Mumbai. On the other hand, one shard should be enough to cope with the volumes of Pune; so for now we tag just one shard. In this recipe, we will see how to achieve this use case using tag-aware sharding. If the description is confusing, don't worry; we will see how to implement what we just discussed.

Getting ready

Refer to the Starting a simple sharded environment of two shards recipes in Chapter 1, Installing and Starting the MongoDB Server, for how to start a simple shard. However, for this recipe, we will add an additional shard. So, we will now start three MongoDB servers listening to ports 27000, 27001, and 27002. Again, it is recommended to start off with a clean database. For the purpose of this recipe, we will be using the userAddress collection to store the data.

How to do it…

  1. Assuming that we have three shards up and running; let us execute the following commands:
    mongos> sh.addShardTag('shard0000', 'Mumbai')
    mongos> sh.addShardTag('shard0001', 'Mumbai')
    mongos> sh.addShardTag('shard0002', 'Pune')
    
  2. With the tags defined, let us define the range of the pin codes that will map to a tag, as follows:
    mongos> sh.addTagRange('test.userAddress', {pincode:400001}, {pincode:400999}, 'Mumbai')
    mongos> sh.addTagRange('test.userAddress', {pincode:411001}, {pincode:411999}, 'Pune')
    
  3. Enable sharding for the test database and userAddress collection as follows:
    mongos> sh.enableSharding('test')
    mongos> sh.shardCollection('test.userAddress', {pincode:1})
    
  4. Insert the following documents in the userAddress collection:
    mongos> db.userAddress.insert({_id:1, name: 'Varad', city: 'Pune', pincode: 411001})
    mongos> db.userAddress.insert({_id:2, name: 'Rajesh', city: 'Mumbai', pincode: 400067})
    mongos> db.userAddress.insert({_id:3, name: 'Ashish', city: 'Mumbai', pincode: 400101})
    
  5. Execute the following explain plans:
    mongos> db.userAddress.find({city:'Pune'}).explain()
    mongos> db.userAddress.find({city:'Mumbai'}).explain()
    

How it works…

Suppose we want to partition data driven by domain in a shard; we can use tag-aware sharding. It is an excellent mechanism that lets us tag the shards and then split the data range across shards identified by the tags. We don't really have to bother about the actual machines and their addresses hosting the shard. Tags act as a good abstraction, in the way, we can tag a shard with multiple tags and one tag can be applied to multiple shards.

In our case, we have three shards and we apply tags to each of them using the sh.addShardTag method. This method takes the shard ID, which we can see in the sh.status call with the "shards" key. This sh.addShardTag can be used to keep adding tags to a shard. Similarly, there is a helper method sh.removeShardTag to remove an assignment of the tag from the shard. Both these methods take two parameters, the first one is the shard ID and the second one of the tag to remove.

Once the tagging is done, we assign the range of the values of the shard key to the tag. The sh.addTagRange method is used to do that. It accepts four parameters; the first one is the namespace (the fully qualified name of the collection), the second and third parameters are the start and end values of the range for this shard key, and the fourth parameter is the tag name of the shards hosting the range being added. For example, the call sh.addTagRange('test.userAddress', {pincode:400001}, {pincode:400999}, 'Mumbai') says we are adding the shard range from 400001 to 400999 for the test.userAddress collection and this range will be stored in the shards tagged as Mumbai.

Once the tagging and adding tag ranges are done, we enable sharding on the database and collection, and add data to it from Mumbai and Pune with the respective pin codes. We then query and explain the plan, to see that the data did indeed reside on the shards we have tagged for Pune and Mumbai city. We can also add new shards to this sharded setup and accordingly tag the new shard. The balancer will then accordingly balance the data based on the value it has tagged. For instance, if the addresses in Pune increase, thus overloading a shard, we can add a new shard with the tag as Pune. The postal addresses for Pune will then be sharded across these two server instances tagged for Pune city.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset