Expiring documents after a fixed interval using the TTL index

One of the interesting features in Mongo is automatically expiring data in the collection after a predetermined amount of time. This is a very useful tool when we want to purge some data older than a particular timeframe. For a relational database, it is not common for folks to set up a batch job that runs every night to perform this operation.

With the TTL feature of Mongo, you need not worry about this as the database takes care of it out of the box. Let's see how we can achieve this.

Getting ready

Let's create data in Mongo that we want to play with using the TTL indexes. We will create a collection called ttlTest for this purpose. We will require a server to be up and running. Refer to the Installing single node MongoDB recipe from Chapter 1, Installing and Starting the Server for instructions on how to start the server. Start the shell with the TTLData.js script loaded. This script is available on the Packt website for download. To know how to start the shell with a script preloaded, refer to the Connecting to a single node in the Mongo shell with JavaScript recipe from Chapter 1, Installing and Starting the Server.

How to do it…

  1. Assuming that the server has started and the script provided is loaded on the shell, invoke the following method from the mongo shell:
    > addTTLTestData()
    
  2. Create a TTL index on the createDate field as follows:
    > db.ttlTest.createIndex({createDate:1}, {expireAfterSeconds:300})
    
  3. Now, query the collection as follows:
    > db.ttlTest.find()
    
  4. This should give us three documents. Repeat the process and execute the find query in approximately 30-40 seconds repeatedly to see the three documents getting deleted until the entire collection has zero documents left in it.

How it works…

Let's start by opening the TTLData.js file and see what is going on inside it. The code is pretty simple and it just gets the current date using new Date(). It then creates three documents with createDate that were four, three, and two minutes behind the current time for the three documents. So, on the execution of the addTTLTestData() method in this script, we have three documents in the ttlTest collection with each having a difference of one minute in their creation time.

The next step is the core of the TTL feature: the creation of the TTL index. It is similar to the creation of any other index using the createIndex method, except that it also accepts a second parameter that is a JSON object. These two parameters are as follows:

  • The first parameter is {createDate:1}; this will tell mongo to create an index on the createDate field, and the order of the index is ascending as the value is 1 (-1 would have been descending).
  • The second parameter, {expireAfterSeconds:300}, is what makes this index a TTL index, and it tells Mongo to automatically expire the documents after 300 seconds (five minutes).

Okay, but five minutes since when? Is it the time they were inserted in the collection or some other timestamp? In this case, it considers the createTime field as the base because this was the field that we created the index on.

This now raises a question: if a field is being used as a base for the computation of time, there has to be some restriction on its type. It just doesn't make sense to create a TTL index, as we created previously, on a char field holding, say, the name of a person.

Yes; as we guessed, the type of the field can be of a BSON type date or an array of dates. What will happen in the case where an array has multiple dates? What will be considered in that case?

It turns out that Mongo uses a minimum of dates available in the array. Try this scenario out as an exercise.

Put two dates separated by about five minutes from each other in a document against the field name, updateField, and then create a TTL index on this field to expire the document after 10 minutes (600 seconds). Query the collection and see when the document gets deleted from the collection. It should get deleted after roughly 10 minutes have elapsed after the minimum time value present in the updateField array.

Apart from the constraint for the type of field, there are a few more constraints:

  • If a field already has an index on it, you cannot create a TTL index. As the _id field of the collection already has an index by default, it effectively means that you cannot create a TTL index on the _id field.
  • A TTL index cannot be a compound index involving multiple fields.
  • If a field doesn't exist, it will never expire. (That's pretty logical, I guess.)
  • It cannot be created on capped collections. In case you are not aware of capped collections, they are special collections in Mongo with a size limit on them with a FIFO insertion order and delete old documents to make place for new documents, if needed.

    Note

    TTL indexes are supported only on the Mongo version 2.2 and above. Note that the document will not be deleted at exactly the given time in the field. The cycle will be of a granularity of one minute, which will delete all the documents eligible for deletion since the last time the cycle was run.

See also

A use case might not demand deleting all the documents after a fixed interval has elapsed. What if we want to customize the point until a document stays in the collection? This too can be achieved, which is what will be demonstrated in the next recipe, Expiring documents at a given time using the TTL index.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset