Expiring documents after a fixed interval using the TTL index

One of the nice and interesting features in Mongo is automatically expiring data in the collection after a predetermined amount of time. This is a very useful tool when we desire to purge some data older than a particular timeframe. For a relational database, it is not common for folks to set up a batch job that runs every night to perform this operation.

With the Time To Live (TTL) feature of Mongo, we need not worry about this as the database takes care of it out-of-the-box. Let's see how we can achieve this.

Getting ready

Let's create some data in Mongo that we want to play with using the TTL indexes. We will create a collection called ttlTest for this purpose. We will require a server to be up and running. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, to learn how to start the server. Also, start the shell with the TTLData.js script loaded. This script will be available on the book's website for download. To know how to start the shell with a script reloaded, refer to the Connecting to a single node from the Mongo shell with a preloaded JavaScript recipe in Chapter 1, Installing and Starting the MongoDB Server.

How to do it…

  1. Assuming that the server is started and the script provided is loaded on the shell, invoke the following method from the Mongo shell:
    > addTTLTestData()
    
  2. Create a TTL index on the createDate field:
    > db.ttlTest.ensureIndex({createDate:1}, {expireAfterSeconds:300})
    
  3. Now, query the collection:
    > db.ttlTest.find()
    
  4. This should give three documents. Repeat the process and execute the find query in approximately 30 to 40 seconds repeatedly, to see the three documents getting deleted until the entire collection has zero documents left in it.

How it works…

Let's start by opening the TTLData.js file and see what is going on in it. The code is pretty simple; it just got the current date using new Date(). It then created three documents with createDate that were some 4, 3, and 2 minutes behind the current time for the three documents. So, on the execution of the addTTLTestData() method in this script, we will have three documents in the ttlTest collection, each having a difference of 1 minute in their creation time.

The next step is the core of the TTL feature: the creation of the TTL index. It is similar to the creation of any other index using the ensureIndex method, except that it also accepts a second parameter, a JSON object. Let's see what these two parameters are:

  • The first parameter is {createDate:1}; this will tell Mongo to create an index on the createDate field, and the order of the index is ascending as the value is 1 (-1 would have been descending)
  • The second parameter, {expireAfterSeconds:300}, is what makes this index a TTL index; it tells Mongo to automatically expire the documents after 300 seconds (5 minutes)

OK, but 5 minutes since when? Since the time they were inserted in the collection or is it some other timestamp? In this case it considers the createTime field as the base, as this was the field on which we created the index.

This now raises a question: if a field is being used as the base for the computation of time, there has to be some restriction on its type. It just doesn't make sense to create a TTL index, as we created earlier, on a char field that holds, say, the name of a person.

As we guessed, the type of the field can be a BSON type date or an array of dates. What will happen in the case where an array has multiple dates? What will be considered in this case?

It turns out that Mongo uses the minimum of dates available in the array. Try out this scenario as an exercise.

Put two dates separated by about 5 minutes from each other in a document against the updateField field name and then create a TTL index on this field, as you did earlier, to expire the document after 10 minutes (600 seconds). Query the collection and see when the document gets deleted from the collection. It should get deleted after roughly 10 minutes have elapsed since the minimum time value present in the updateField array.

Apart from the constraint for the type of field, there are a few more constraints.

  • If a field already has an index on it, you cannot create a TTL index on it. As the _id field of the collection already has an index by default, it effectively means you cannot create a TTL index on the _id field.
  • A TTL index cannot be a compound index that involves multiple fields.
  • If a field doesn't exist, it will never expire (this is pretty logical, I guess).
  • A TTL index cannot be created on capped collections. In case you are not aware of capped collections, they are special collections in Mongo with a size limit on them with the first in first out (FIFO) insertion order; they delete old documents to make place for new documents, if needed.

    Note

    TTL indexes are supported only on Mongo Version 2.2 and above. Also note that the document will not be deleted at exactly the given time in the field. The cycle will be of the granularity of 1 minute; it will delete all the documents eligible for deletion since the last time the cycle was run.

There's more…

A use case might not demand the deletion of all the documents after a fixed interval has elapsed. What if we want to customize the point until which a document stays in the collection? This too can be achieved, and will be demonstrated in the next recipe.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset