Estimating the working set

We start by defining what the working set is. It is a subset of the total data frequently accessed by the application. In an application, which stores information over a period of time, the working set is mostly the recently accessed data. The word ''recently'' is subjective; for some it might be a day or two, for others it might be a couple of months. This is mostly something that needs to be thought of while designing the application and sizing the database. The working set is something that needs to be in the RAM of the database server to minimize the page faults and get the optimum performance.

In this recipe, we will see a way that gives the estimate of your working set and is a feature introduced in Mongo 2.4. The word ''estimator'' is slightly misleading, as the initial sizing still is a manual activity, and the system designers need to be judicious about the server configuration. The working set estimator utility we will see now is more of a reactive approach, which will kick in once the application is up and running. It provides metrics that can be used by monitoring tools, and tells us if the RAM on the server can accommodate the working set or if the set outruns the available RAM. This then demands some resizing of the hardware or scaling of the database horizontally.

Getting ready

In this recipe, we will be simulating some operations on a standalone Mongo instance. We need to start a standalone server listening to any port for client connections; in this case, we will stick to the default 27017. In case you are not aware of how to start a standalone server, refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server. Connect to the server from the Mongo shell.

How to do it…

The working set now is a part of the server's status output. There is a field called workingSet, whose value is a document that gives these estimates.

This working set is not available as part of the standard serverStatus command and needs to be demanded explicitly. It is not an operation cheap on resources, and thus needs to be monitored if it is executed frequently. Frequent invocations can have a detrimental effect on the performance of the server.

We need to run the following command from the Mongo shell to get the working set estimates:

> db.runCommand({serverStatus:1, workingSet:1}).workingSet
{
        "note" : "thisIsAnEstimate",
        "pagesInMemory" : 6188,
        "computationTimeMicros" : 11524,
        "overSeconds" : 3977
}

How it works…

There are just four fields in this document for the working set estimate, with the first just stating in text that this is an estimate. The pagesInMemory, computationTimeMicros, and overSeconds fields are something we will be more interested in.

We will look at the overSeconds field first. This is the time in seconds between the first and the last page loaded by Mongo in the memory. When the server is started, this value will obviously be less but eventually, with more data being accessed with time, more pages will be loaded by Mongo in the memory. If the RAM available is abundant, the first loaded page will stay in memory and new pages will continue to load as and when needed. Hence, the time will also increase, as the difference between the most recently loaded page and the oldest page will increase. If this time stays low, or even decreases, we can say that the oldest and newest page in Mongo were loaded in just the number of seconds given by this figure. This can be an indication that the number of pages accessed and loaded in memory by the MongoDB server is more than those that can be held in memory. As Mongo uses the least recently used (LRU) policy to evict a page from the memory to make space for the new page, we possibly are risking evicting pages that might be needed again, causing more page faults.

This is where the pagesInMemory field comes in. This tells us, over a period of time, the number of pages Mongo loaded in the memory. Each page multiplied by around 4 KB gives the size of data loaded in the memory in bytes. Thus, if all data is being accessed after the server is started, this size will be around your data size. This number will keep increasing with time hence this field, in conjunction with the overSeconds field, is an important statistic.

The final field, computationTimeMicros, gives the time in microseconds taken by the server to give this statistic for the working set. As we can see, it is not an incredibly cheap operation to execute and thus, this statistic should be demanded with caution, especially on high-throughput systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset