Viewing collection stats

When it comes to the usage of storage, one of the interesting statistics from an administrative point of view is perhaps the number of documents in a collection, possibly to estimate future space and memory requirements based on the growth of the data to get high-level statistics of the collection.

Getting ready

To find the stats of the collection, we need to have a server up and running, and a single node should be ok. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, for how to start the server. The data on which we will be operating needs to be imported into the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. Once these steps are completed, we are all set to go ahead with this recipe.

How to do it…

We will be using the postalCodes collection to view the stats. Let's take a look at the steps in detail:

  1. Open the Mongo shell and connect it to the running MongoDB instance. In this case, start Mongo on the default port 27017 and execute the following command:
    $ mongo
    
  2. With the data imported, create an index in the pincode field, if one doesn't exist, as follows:
    > db.postalCodes.ensureIndex({'pincode':1})
    
  3. On the Mongo terminal, execute the following command:
    > db.postalCodes.stats()
    
  4. Observe the output. Now execute the following command on the shell:
    > db.postalCodes.stats(1024)
    {
      "ns" : "test.postalCodes",
      "count" : 39732,
      "size" : 5561,
      "avgObjSize" : 0.1399627504278667,
      "storageSize" : 16380,
      "numExtents" : 6,
      "nindexes" : 2,
      "lastExtentSize" : 12288,
      "paddingFactor" : 1,
      "systemFlags" : 1,
      "userFlags" : 0,
      "totalIndexSize" : 2243,
      "indexSizes" : {
        "_id_" : 1261,
        "pincode_1" : 982
      },
      "ok" : 1
    }
    

Again, observe the output. We will now see what these values mean to us in the next section.

How it works…

If we observe the output for the db.postalCodes.stats() and db.postalCodes.stats(1024) commands, we see that the second one has all the figures in KB whereas the first one is in bytes. The parameter provided is known as scale and all the figures indicating size are divided by this scale. In this case, as we gave the value as 1024, we get all the values in KB; whereas if 1024 * 1024 is passed as the value of the scale, the size shown will be in MB. For our analysis, we will use the one that shows the sizes in KB:

> db.postalCodes.stats(1024)
{
  "ns" : "test.postalCodes",
  "count" : 39732,
  "size" : 5561,"avgObjSize" : 0.1399627504278667,
  "storageSize" : 16380,
  "numExtents" : 6,
  "nindexes" : 2,
  "lastExtentSize" : 12288,
  "paddingFactor" : 1,
  "systemFlags" : 1,
  "userFlags" : 0,
  "totalIndexSize" : 2243,
  "indexSizes" : {
    "_id_" : 1261,
    "pincode_1" : 982
  },
  "ok" : 1
}

The following table shows the meaning of the important fields:

Field

Description

ns

This is the fully qualified name of the collection with the <database>.<collection name> format.

count

This is the number of documents in the collection.

size

This is the actual storage size occupied by the documents in the collection. Adding, deleting, or updating documents in the collection can change this figure. The scale parameter affects this field's value and in our case, this value is in KB as 1024 is the scale.

avgObjSize

This is the average size of the document in the collection. It is simply the size field divided by the count of documents in the collection. The scale parameter affects this field's value and in our case, this value is in KB as 1024 is the scale.

storageSize

Mongo preallocates the space on the disk to ensure that the documents in the collection are kept on continuous locations to provide better performance in disk access. This preallocation fills up the files with zeros and then starts allocating space to these inserted documents. This field reveals the size of the storage used by this collection. This figure will generally be much more than the actual size of the collection. The scale parameter affects this field's value and in our case, this value is in KB as 1024 is the scale.

numExtents

As we saw, Mongo preallocates continuous disk space to the collections for performance purposes. However, as the collection grows, new space needs to be allocated. This field gives the number of such continuous chunk allocation. This continuous chunk is called extent.

nindexes

This field gives the number of indexes present in the collection. This value would be 1, even if we do not create an index on the collection, as Mongo implicitly creates an index on the _id field.

lastExtentSize

This is the size of the last extent allocated. The scale parameter affects this field's value and in our case, this value is in KB as 1024 is the scale.

paddingFactor

We can look at this factor as a multiplier to the actual document size in order to compute the storage size. For example, if the document to be inserted is 2 KB, with a paddingFactor field of 1, the size allocated to the document is 2 KB; that is, with no padding. On the other hand, if the paddingFactor field is 1.5, the space allocated to the document will be 3 KB (2 * 1.5), which gives a padding of 1 KB. In our case, the paddingFactor field is 1 because we did a mongoimport. We will discuss padding and padding factor in the next section.

totalIndexSize

Indexes take up space to store. This field gives the total size taken up by the indexes on the disk. The scale parameter affects this field's value and in our case, this value is in KB as 1024 is the scale.

indexSizes

This field itself is a document, with the key as the name of the index and the value as the size of the index in question. In our case, we had created an index explicitly on the pincode field. Thus, we see the name of the index as the key and the size of the index on disk as the value. The total of these values of all the indexes is the same as the value given earlier, that is, totalIndexSize. The scale parameter affects this field's value and in our case, this value is in KB as 1024 is the scale.

Let's take a look at the paddingFactor field. Documents are placed on the storage device in continuous locations. If, however, an update occurs that causes the size of the document to increase, Mongo obviously will not be able to increase the document size if there was no buffer space kept after the document. The only solution is to copy the entire document towards the end of the collection with the necessary updates made to it. This operation turns out to be expensive, affecting the performance of such update operations. If the paddingFactor field is 1, no padding or buffer space is kept between two consecutive documents, making it impossible for the first of these two documents to grow on updates. If this paddingFactor field is more than 1, there would be some buffer space accommodating some small size changes for the documents. This paddingFactor field, however, is not set by the user and MongoDB calculates it for the collection over a period of time. It then uses this calculated paddingFactor field to allocate a space for the new documents inserted. To get a feel of how this padding factor changes, let us do a small exercise:

  1. Execute the following command in the Mongo shell:
    > for(i = 0 ; i < 10; i++) {
        db.paddingFactorTest.insert({value:'Hello World'})
       }
    
  2. Now execute the following command and take note of the paddingFactor value (it would be 1):
    > db.paddingFactorTest.stats()
    
  3. We will now make some updates to let the document grow in size as follows:
    > for(i = 0; i < 5;i++) { 
          db.paddingFactorTest.update({value:'Hello World'}, {$push:{value1:'Value'}}, false, true)
       }
    > db.paddingFactorTest.stats()
    

Query the stats again and observe the value of paddingFactor that has gone slightly over 1, which shows that the MongoDB server adjusted this value while allocating space for a document insertion at a later point in time.

We saw how paddingFactor affects the storage allocated to a document, but neither do we have control on this value, nor can we instruct Mongo beforehand on what additional buffer needs to be allocated to each document inserted based on the anticipated growth of a document. There is, however, a technique that let us achieves this in a way that we will see in the Manually padding a document recipe.

See also

  • The Viewing database stats recipe to view the stats at a database level
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset