One of the interesting features in Mongo is automatically expiring data in the collection after a predetermined amount of time. This is a very useful tool when we want to purge some data older than a particular timeframe. For a relational database, it is not common for folks to set up a batch job that runs every night to perform this operation.
With the TTL feature of Mongo, you need not worry about this as the database takes care of it out of the box. Let's see how we can achieve this.
Let's create data in Mongo that we want to play with using the TTL indexes. We will create a collection called ttlTest
for this purpose. We will require a server to be up and running. Refer to the Installing single node MongoDB recipe from Chapter 1, Installing and Starting the Server for instructions on how to start the server. Start the shell with the TTLData.js
script loaded. This script is available on the Packt website for download. To know how to start the shell with a script preloaded, refer to the Connecting to a single node in the Mongo shell with JavaScript recipe from Chapter 1, Installing and Starting the Server.
> addTTLTestData()
createDate
field as follows:> db.ttlTest.createIndex({createDate:1}, {expireAfterSeconds:300})
> db.ttlTest.find()
find
query in approximately 30-40 seconds repeatedly to see the three documents getting deleted until the entire collection has zero documents left in it.Let's start by opening the TTLData.js
file and see what is going on inside it. The code is pretty simple and it just gets the current date using new Date()
. It then creates three documents with createDate
that were four, three, and two minutes behind the current time for the three documents. So, on the execution of the addTTLTestData()
method in this script, we have three documents in the ttlTest
collection with each having a difference of one minute in their creation time.
The next step is the core of the TTL feature: the creation of the TTL index. It is similar to the creation of any other index using the createIndex
method, except that it also accepts a second parameter that is a JSON object. These two parameters are as follows:
{createDate:1}
; this will tell mongo to create an index on the createDate
field, and the order of the index is ascending as the value is 1
(-1
would have been descending).{expireAfterSeconds:300}
, is what makes this index a TTL index, and it tells Mongo to automatically expire the documents after 300 seconds (five minutes).Okay, but five minutes since when? Is it the time they were inserted in the collection or some other timestamp? In this case, it considers the createTime
field as the base because this was the field that we created the index on.
This now raises a question: if a field is being used as a base for the computation of time, there has to be some restriction on its type. It just doesn't make sense to create a TTL index, as we created previously, on a char
field holding, say, the name of a person.
Yes; as we guessed, the type of the field can be of a BSON type date or an array of dates. What will happen in the case where an array has multiple dates? What will be considered in that case?
It turns out that Mongo uses a minimum of dates available in the array. Try this scenario out as an exercise.
Put two dates separated by about five minutes from each other in a document against the field name, updateField
, and then create a TTL index on this field to expire the document after 10 minutes (600 seconds). Query the collection and see when the document gets deleted from the collection. It should get deleted after roughly 10 minutes have elapsed after the minimum time value present in the updateField
array.
Apart from the constraint for the type of field, there are a few more constraints:
_id
field of the collection already has an index by default, it effectively means that you cannot create a TTL index on the _id
field.TTL indexes are supported only on the Mongo version 2.2 and above. Note that the document will not be deleted at exactly the given time in the field. The cycle will be of a granularity of one minute, which will delete all the documents eligible for deletion since the last time the cycle was run.
A use case might not demand deleting all the documents after a fixed interval has elapsed. What if we want to customize the point until a document stays in the collection? This too can be achieved, which is what will be demonstrated in the next recipe, Expiring documents at a given time using the TTL index.