When it comes to the usage of storage, one of the interesting statistics from an administrative point of view is perhaps the number of documents in a collection, possibly to estimate future space and memory requirements based on the growth of the data to get high-level statistics of the collection.
To find the stats of the collection, we need to have a server up and running, and a single node should be ok. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, for how to start the server. The data on which we will be operating needs to be imported into the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. Once these steps are completed, we are all set to go ahead with this recipe.
We will be using the postalCodes
collection to view the stats. Let's take a look at the steps in detail:
27017
and execute the following command:$ mongo
pincode
field, if one doesn't exist, as follows:> db.postalCodes.ensureIndex({'pincode':1})
> db.postalCodes.stats()
> db.postalCodes.stats(1024) { "ns" : "test.postalCodes", "count" : 39732, "size" : 5561, "avgObjSize" : 0.1399627504278667, "storageSize" : 16380, "numExtents" : 6, "nindexes" : 2, "lastExtentSize" : 12288, "paddingFactor" : 1, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 2243, "indexSizes" : { "_id_" : 1261, "pincode_1" : 982 }, "ok" : 1 }
Again, observe the output. We will now see what these values mean to us in the next section.
If we observe the output for the db.postalCodes.stats()
and db.postalCodes.stats(1024)
commands, we see that the second one has all the figures in KB whereas the first one is in bytes. The parameter provided is known as scale and all the figures indicating size are divided by this scale. In this case, as we gave the value as 1024
, we get all the values in KB; whereas if 1024 * 1024 is passed as the value of the scale, the size shown will be in MB. For our analysis, we will use the one that shows the sizes in KB:
> db.postalCodes.stats(1024) { "ns" : "test.postalCodes", "count" : 39732, "size" : 5561,"avgObjSize" : 0.1399627504278667, "storageSize" : 16380, "numExtents" : 6, "nindexes" : 2, "lastExtentSize" : 12288, "paddingFactor" : 1, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 2243, "indexSizes" : { "_id_" : 1261, "pincode_1" : 982 }, "ok" : 1 }
The following table shows the meaning of the important fields:
Let's take a look at the paddingFactor
field. Documents are placed on the storage device in continuous locations. If, however, an update occurs that causes the size of the document to increase, Mongo obviously will not be able to increase the document size if there was no buffer space kept after the document. The only solution is to copy the entire document towards the end of the collection with the necessary updates made to it. This operation turns out to be expensive, affecting the performance of such update operations. If the paddingFactor
field is 1
, no padding or buffer space is kept between two consecutive documents, making it impossible for the first of these two documents to grow on updates. If this paddingFactor
field is more than 1, there would be some buffer space accommodating some small size changes for the documents. This paddingFactor
field, however, is not set by the user and MongoDB calculates it for the collection over a period of time. It then uses this calculated paddingFactor
field to allocate a space for the new documents inserted. To get a feel of how this padding factor changes, let us do a small exercise:
> for(i = 0 ; i < 10; i++) { db.paddingFactorTest.insert({value:'Hello World'}) }
paddingFactor
value (it would be 1
):> db.paddingFactorTest.stats()
> for(i = 0; i < 5;i++) { db.paddingFactorTest.update({value:'Hello World'}, {$push:{value1:'Value'}}, false, true) } > db.paddingFactorTest.stats()
Query the stats again and observe the value of paddingFactor
that has gone slightly over 1, which shows that the MongoDB server adjusted this value while allocating space for a document insertion at a later point in time.
We saw how paddingFactor
affects the storage allocated to a document, but neither do we have control on this value, nor can we instruct Mongo beforehand on what additional buffer needs to be allocated to each document inserted based on the anticipated growth of a document. There is, however, a technique that let us achieves this in a way that we will see in the Manually padding a document recipe.