Creating a background and foreground index in the shell

In our previous recipe, we looked at how to analyze the queries, how to decide what index needs to be created, and how to create indexes. This, by itself, is straightforward and looks reasonably simple. However, for large collections, things start getting worse as the index creation time is large. The objective of this recipe is to throw some light on these concepts and avoid these pitfalls while creating indexes, especially on large collections.

Getting ready

For the creation of indexes, we need to have a server up and running. A simple single node is what we need. Refer to the Installing single node MongoDB recipe from Chapter 1, Installing and Starting the Server for instructions on how to start the server.

Start connecting two shells to the server by just typing mongo from the operating system shell. Both of them will, by default, connect to the test database.

Our test data for zip codes is too small to demonstrate the problem faced in index creation on large collections. We need to have more data and thus, we will start by creating some data to simulate the problems during index creation. The data has no practical meaning but is good enough to test the concepts. Copy the following piece in one of the started shells and execute: (It is a pretty easy snippet to type out.)

for(i = 0; i < 5000000 ; i++) {
  doc = {}
  doc._id = i
  doc.value = 'Some text with no meaning and number ' + i + ' in between'
  db.indexTest.insert(doc)
}

A document in this collection will look something as follows:

{ _id:0, value:"Some text with no meaning and number 0 in between" }

The execution will take quite a lot of time, so we need to be patient. Once the execution is over, we are all set for the action.

Note

If you are keen to know what the current number of documents loaded in the collection is, keep evaluating the following from the second shell periodically:

db.indexTest.count()

How to do it…

  1. Create an index on the value field of the document as follows:
    > db.indexTest.createIndex({value:1})
    
  2. While the index creation is in progress, which should take quite some time, switch over to the second console and execute the following:
    > db.indexTest.findOne()
    
  3. Both the index creation shell and the one where we executed findOne will be blocked and the prompt will not be shown on both of them until the index creation is complete.
  4. Now, this was foreground index creation by default. We want to see the behavior in background index creation. Drop the created index as follows:
    > db.indexTest.dropIndex({value:1})
    
  5. Create the index again, but this time in background, as follows:
    > db.indexTest.createIndex({value:1}, {background:true})
    
  6. In the second mongo shell, execute findOne as follows:
    > db.indexTest.findOne()
    
  7. This should return one document, which is unlike the first instance, where the operation was blocked until the index creation completed in the foreground.
  8. In the second shell, repeatedly execute the following explain operation with a four-to-five second interval between each explain plan invocation until the index creation process is complete:
    > db.indexTest.find({value:"Some text with no meaning and number 0 in between"}).explain()
    

How it works…

Let's now analyze what we just did. We created about five million documents with no practical importance, but we are just looking to get some data that will take a significant amount of time to build the index.

An index can be built in two ways, in the foreground and background. In either case, the shell doesn't show the prompt until the createIndex operation has been completed and will block all operations until the index is created. To illustrate the difference between a foreground and background index creation, we executed a second mongo shell.

We first created the index in the foreground, which is the default behavior. This index building didn't allow us to query the collection (from the second shell) until the index was constructed. The findOne operation is blocked until the entire index was built (from the first shell) before returning the result. On other hand, the index that was built in the background didn't block the findOne operation. If you want to try inserting new documents into the collection while the index building is on, this should work very well. Feel free to drop the index and recreate it in the background, while simultaneously inserting a document into the indexTest collection, and you will notice that it works smoothly.

Well, what is the difference between the two approaches and why not always build the index in the background? Apart from an extra parameter, {background:true} (which can also be{background:1}) passed as a second parameter to the createIndex call, there are few differences. The index creation process in the background will be slightly slower than the index created in the foreground. Furthermore, internally—though not relevant to the end user—the index created in the foreground will be more compact than the one created in the background.

Other than this, there will be no significant difference. In fact, if a system is running and an index needs to be created while it is serving the end users (not recommended, but a situation can come up at times that demands index creation in a live system), then creating an index in the background is the only way you can do it. There are other strategies to perform such administrative activities, which we will see in some recipes in the administration section.

To make things worse for foreground index creation, the lock acquired by mongo during index creation is not at the collection level but is at the database level. To explain what this means, we will have to drop the index on the indexTest collection and perform the following small exercise:

  1. Start by creating the index in the foreground from the shell by executing the following command:
    > db.indexTest.createIndex({value:1})
    
  2. Now, insert a document into the person collection, which may or may not exist at this point in the test database, as follows:
    > db.person.insert({name:'Amol'})
    

We will see that this insert operation in the person collection will be blocked while index creation on the indexTest collection is in process. However, if this insert operation was done on a collection in a different database during the index building, it would execute normally without blocking. (You can try this out as well.) This clearly shows that the lock is acquired at the database level and not at the collection or global level.

Note

Prior to version 2.2 of mongo, locks were at the global level, which is at the mongod process level, and not at the database level as we saw previously. You need to remember this fact when dealing with a distribution of mongo older than version 2.2.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset