In our previous recipe, we looked at how to analyze the queries, how to decide what index needs to be created, and how to create indexes. This, by itself, is straightforward and looks reasonably simple. However, for large collections, things start getting worse as the index creation time is large. The objective of this recipe is to throw some light on these concepts and avoid these pitfalls while creating indexes, especially on large collections.
For the creation of indexes, we need to have a server up and running. A simple single node is what we need. Refer to the Installing single node MongoDB recipe from Chapter 1, Installing and Starting the Server for instructions on how to start the server.
Start connecting two shells to the server by just typing mongo
from the operating system shell. Both of them will, by default, connect to the test
database.
Our test data for zip codes is too small to demonstrate the problem faced in index creation on large collections. We need to have more data and thus, we will start by creating some data to simulate the problems during index creation. The data has no practical meaning but is good enough to test the concepts. Copy the following piece in one of the started shells and execute: (It is a pretty easy snippet to type out.)
for(i = 0; i < 5000000 ; i++) { doc = {} doc._id = i doc.value = 'Some text with no meaning and number ' + i + ' in between' db.indexTest.insert(doc) }
A document in this collection will look something as follows:
{ _id:0, value:"Some text with no meaning and number 0 in between" }
The execution will take quite a lot of time, so we need to be patient. Once the execution is over, we are all set for the action.
value
field of the document as follows:> db.indexTest.createIndex({value:1})
> db.indexTest.findOne()
findOne
will be blocked and the prompt will not be shown on both of them until the index creation is complete.> db.indexTest.dropIndex({value:1})
> db.indexTest.createIndex({value:1}, {background:true})
findOne
as follows:> db.indexTest.findOne()
> db.indexTest.find({value:"Some text with no meaning and number 0 in between"}).explain()
Let's now analyze what we just did. We created about five million documents with no practical importance, but we are just looking to get some data that will take a significant amount of time to build the index.
An index can be built in two ways, in the foreground and background. In either case, the shell doesn't show the prompt until the createIndex
operation has been completed and will block all operations until the index is created. To illustrate the difference between a foreground and background index creation, we executed a second mongo shell.
We first created the index in the foreground, which is the default behavior. This index building didn't allow us to query the collection (from the second shell) until the index was constructed. The findOne
operation is blocked until the entire index was built (from the first shell) before returning the result. On other hand, the index that was built in the background didn't block the findOne
operation. If you want to try inserting new documents into the collection while the index building is on, this should work very well. Feel free to drop the index and recreate it in the background, while simultaneously inserting a document into the indexTest
collection, and you will notice that it works smoothly.
Well, what is the difference between the two approaches and why not always build the index in the background? Apart from an extra parameter, {background:true}
(which can also be{background:1}
) passed as a second parameter to the createIndex
call, there are few differences. The index creation process in the background will be slightly slower than the index created in the foreground. Furthermore, internally—though not relevant to the end user—the index created in the foreground will be more compact than the one created in the background.
Other than this, there will be no significant difference. In fact, if a system is running and an index needs to be created while it is serving the end users (not recommended, but a situation can come up at times that demands index creation in a live system), then creating an index in the background is the only way you can do it. There are other strategies to perform such administrative activities, which we will see in some recipes in the administration section.
To make things worse for foreground index creation, the lock acquired by mongo during index creation is not at the collection level but is at the database level. To explain what this means, we will have to drop the index on the indexTest
collection and perform the following small exercise:
> db.indexTest.createIndex({value:1})
> db.person.insert({name:'Amol'})
We will see that this insert operation in the person collection will be blocked while index creation on the indexTest
collection is in process. However, if this insert operation was done on a collection in a different database during the index building, it would execute normally without blocking. (You can try this out as well.) This clearly shows that the lock is acquired at the database level and not at the collection or global level.