Background and foreground index creation from the shell

In the previous recipe, we looked at how to analyze queries, how to decide what index needs to be created, and how we create indexes. This, by itself, is straightforward and looks reasonably simple. However, for large collections, things start getting worse as the index-creation time is large. There are some caveats that we need to keep in mind. The objective of this recipe is to throw some light on these concepts and avoid pitfalls while creating indexes, especially on large collections.

Getting ready

For the creation of indexes, we need to have a server up and running. A simple single node is what we will need. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, for how to start the server.

Start connecting two shells to the server by just typing in mongo from the operating system shell. Both of them will, by default, connect to the test database.

Our test data for zipped codes is pretty small to demonstrate the problem faced during index creation on large collections. We need to have more data; thus, we will start by creating some to simulate the problems during index creation. The data has no practical meaning but is good enough to test the concepts. Copy the following piece of code in one of the started shells and execute it (it is a pretty easy snippet to type out too):

for(i = 0; i < 5000000 ; i++) {
  doc = {}
  doc._id = i
  doc.value = 'Some text with no meaning and number ' + i + ' in between'
  db.indexTest.insert(doc)
}

A document in this collection will be as follows:

{ _id:0, value:"Some text with no meaning and number 0 in between" }

Execution will take a quite a lot of time, so we need to be patient. Once the execution is over, we are all set for the action.

Note

If you are keen to know what the current number of documents loaded in the collection is, evaluate the following command from the second shell periodically:

> db.indexTest.count()

How to do it…

  1. Create an index on the value field of the document:
    > db.indexTest.ensureIndex({value:1})
    
  2. While the index creation is in progress, which will take quite some time, switch over to the second console and execute the following command:
    > db.indexTest.findOne()
    

    Both the index creation shell and the one where we executed findOne will be blocked, and the prompt will not be shown on both of them until the index creation is complete.

  3. Now, this was foreground index creation by default. We want to see the behavior in background index creation. Drop the created index:
    > db.indexTest.dropIndex({value:1})
    
  4. Create the index again but, this time, in the background:
    > db.indexTest.ensureIndex({value:1}, {background:true})
    
  5. In the second Mongo shell, execute the findOne query this time around:
    > db.indexTest.findOne()
    

    This should return one document this time around, unlike the first instance where the operation was blocked until index creation completed in the foreground

  6. In the second shell, also repeatedly execute the following explain operation with an interval of about 4 to 5 seconds between each explain plan invocation until the index-creation process is complete:
    > db.indexTest.find({value:"Some text with no meaning and number 0 in between"}).explain()
    

How it works…

Let's now analyze what we just did. We created about 5 million documents with no practical importance, but we are just looking to get some data that will take a significant amount of time for index building.

Indexes can be built in two ways, in the foreground and background. In either case, the shell doesn't show the prompt until the ensureIndex operation is completed and it doesn't show the blocks till the index is created. You might then be wondering what difference it makes to create an index in the background or foreground.

That is exactly where the second shell we started came into the picture. This is where we demonstrated the difference between a background and foreground index-creation process. We first created the index in the foreground, which is the default behavior. This index building didn't allow us to query the collection (from the second shell) until the index was constructed. The findOne operation is blocked until the entire index is built (from the first shell) before returning the result. On the other hand, the index that was built in the background didn't block the findOne operation. If you want to try inserting new documents into the collection while the index build is on, this too should work well. Feel free to drop the index and recreate it in the background while simultaneously inserting a document in the indexTest collection; you will notice that it works smoothly.

Well, what is the difference between the two approaches and why not always build the index in the background? Apart from an extra parameter, {background:true}, which can also be {background:1}, passed as a second parameter to the ensureIndex call, there are few differences. The index-creation process in the background will be slightly slower than the index created in the foreground. Furthermore, internally, though it is not relevant to the end user, the index created in the foreground will be more compact than the one created in the background.

Other than that, there will be no significant difference. In fact, if a system is running and an index needs to be created while it is serving the end users (not recommended, but there can be a situation that demands index creation on a live system), then creating the index in the background is the only way we can do it. There are other strategies for performing such administrative activities that we will see in some recipes in Chapter 4, Administration.

To make things worse for foreground index creation, the lock acquired by Mongo during index creation is not at the collection level but at the database level. To explain what this means, we will have to drop the index on the indexTest collection and perform a small exercise as follows:

  1. Start by creating the index in the foreground from the shell by executing the following command:
    > db.indexTest.ensureIndex({value:1})
    
  2. Now, insert a document in the person collection, which might or might not exist at this point in the test database:
    > db.person.insert({name:'Amol'})
    

We will see that this insert operation on the person collection will create a block, while the index creation on the indexTest collection is in process. If, however, this insert operation is done on a collection in a different database during index build (you can try this out too), it will execute normally without blocking. This clearly shows that the lock is acquired at the database level and not at the collection level or global level.

Note

Prior to version 2.2 of Mongo, locks were at the global level, which is at the mongod process level and not at the database level as we saw earlier. You need to remember this fact when dealing with the distribution of Mongo that is older than version 2.2.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset