Monitoring MongoDB instances on MMS

The previous recipes, Signing up for MMS and setting up the MMS monitoring agent and Managing users and groups in the MMS console, showed us how to set up an MMS account and agent, add hosts, and manage user access to the MMS console. The core objective of MMS is monitoring the host instances, which is still not discussed. In this recipe, we will be performing some operations on the host that we added to MMS in the first recipe, and we will monitor it from the MMS console.

Getting ready

Follow the recipe Signing up for MMS and setting up the MMS monitoring agent and that is pretty much what is needed for this recipe. You may choose to have a standalone instance or a replica set, either ways is fine. Also, open a Mongo shell and connect to the primary instance from it (it is a replica set).

How to do it…

  1. Start by logging into the MMS console and clicking on Deployment in the upper-left corner, and then again on the Deployment link in the submenu, as shown in the following screenshot:
    How to do it…
  2. Clicking on one of the hostnames shown, we will see a large variety of graphs showing various statistics. In this recipe, we will analyze a majority of these.
  3. Open the bundle downloaded for the book. In Chapter 4, Administration, we used a JavaScript file named KeepServerBusy.js to keep the server busy with some operations. We will be using the same script this time around.
  4. In the operating system shell, execute the following command with the .js file in the current directory. The shell connects to the port, in my case port 27000, for the primary.
    $ mongo KeepServerBusy.js --port 27000 --quiet
    
  5. Once started, keep it running and give it 5–10 minutes before you start monitoring the graphs on the MMS console.

How it works…

The Understanding the mongostat and mongotop utilities utilities recipe in Chapter 4, Administration, demonstrated how these utilities can be used to get the current operations and resource utilization. That is a fairly basic and helpful way to monitor a particular instance. MMS, however, gives us one place to monitor the MongoDB instance with pretty easy-to-understand graphs. MMS also gives us historical stats, which mongostat and mongotop cannot give.

Before we go ahead with the analysis of the metrics, I would like to mention that in case of MMS monitoring, the data is not queried nor sent out over the public network. It is just the statistics that are sent over a secure channel by the agent. The source code for the agent is open source and is available for examination if needed. The mongod servers need not be accessible from the public network, as the cloud-based MMS service never communicates to the server instances directly. It is the MMS agent that communicates to the MMS service. Typically, one agent is enough to monitor several servers, unless you plan to segregate them into different groups. Also, it is recommended to run the agent on a dedicated machine / virtual machine and not share it with any of the mongod or mongos instances, unless it is a less crucial test instance group you are monitoring.

Let us see some of these statistics on the console; we start with the memory related ones. The following graph shows the resident, mapped, and virtual memory:

How it works…

As seen in the previous graph, the resident memory for the data set is 82 MB, which is very low, and it is the actual physical memory used up by the mongod process. This current value is significantly below the free memory available, and generally, this will increase over a period of time until it reaches a point where it has used up a large chunk of the total available physical memory. This is automatically taken care of by the mongod server process, and we can't force it to use up more memory, even though it is available on the machine it is running on.

The mapped memory, on the other hand, is about the total size of the database, and is mapped by MongoDB. This size can be (and usually is) much higher than the physical memory available, which enables the mongod process to address the entire dataset as it is present in memory even if it isn't present. MongoDB offloads this responsibility of mapping and loading of data to and from the disk to the underlying operating system. Whenever a memory location is accessed and it is not available in the RAM (that is, the resident memory), the operating system fetches the page into memory, evicting some page to make space for the new page if necessary. What exactly is a memory-mapped file? Let us try to see with a super-scaled-down version. Suppose we have a file of 1 KB (1024 bytes) and the RAM is only 512 bytes, then obviously we cannot have the whole file in the memory. However, you can ask the operating system to map this file to the available RAM in pages. Suppose the page is of 128 bytes, then the total file is eight pages (128 * 8 = 1024). However, the OS can load four pages only, and assume that it loaded the first four pages (up to 512 bytes) in memory. When we access the byte number 200, it is ok and found in memory, as it is present on page 2. But what if we access byte 800, which is logically on page 7, which is not loaded in memory? What the OS does is, it takes one page out from the memory and loads page 7, which contains byte number 800. MongoDB as an application gets a feel that everything was loaded in memory and was accessed by the byte index, but actually it wasn't, and OS transparently did the work for us. As the page accessed was not present in memory and we had to go to the disk to load it in memory, it is called a page fault.

Getting back to the stats shown in the graph, the virtual memory contains all the memory usage, including the mapped memory, plus any additional memory used, such as the memory associated with the thread stack associated with each connection, and so on. If journaling is enabled, this size will definitely be more than twice that of the mapped memory, as journaling too will have a separate memory mapping for the data. Thus we have two addresses mapping the same memory location. This doesn't mean that the page will be loaded twice. It just means that two different memory locations can be used to address the same physical memory. Very high virtual memory might need some investigations. There is no predetermined value for what too high or a low value is; generally these values are monitored for your system under normal circumstances when you are happy with the performance of your system. These benchmark values should then be compared with the figures seen when the system performance goes down, and then appropriate actions can be taken.

As we saw earlier, page faults are caused when an accessed memory location is not present in the resident memory, causing OS to load the page from the memory. This IO activity will definitely cause the performance to reduce, and too many page faults can bring down the database performance dramatically. The following graph shows quite a few page faults occurring per minute. However, if the disk used is SSDs instead of the spinning disk, the hit in terms of seek time from drive might not be significantly high.

How it works…

A large number of page faults usually occur when enough physical memory isn't available to accommodate the data set, and the operating system needs to get the data from the disk into the memory. Note that this stat shown earlier is taken on an MS Windows platform and this graph might seem high for a very trivial operation. The value shown here is the sum of hard and soft page faults and doesn't really give a true figure of how good (or bad) the system is doing. These figures would be different on a Unix-based operating system. There is a JIRA open at the time of writing this book, which reports this problem (https://jira.mongodb.org/browse/SERVER-5799).

One thing you might need to remember is that, in production systems, MongoDB doesn't work well with NUMA architecture and you might see a lot of page faults occurring even if the available memory seems to be high enough. Refer to http://docs.mongodb.org/manual/administration/production-notes/ for more details.

There is an additional graph, as seen next, which gives some details about nonmapped memory. As we saw earlier in this section, there are three types of memory, namely, mapped, resident, and virtual. Mapped memory is always less than virtual memory. Virtual memory will be more than twice that of mapped memory if journaling is enabled. If we look at the graph given earlier in this section, we see that the mapped memory is 192 MB, whereas the virtual memory is 532 MB. As journaling is enabled, the memory is more than twice that of the mapped memory. When journaling is enabled, the same page of data is mapped twice in memory. Note that the page is physically loaded only once; it is just that the same location can be addressed using two different addresses.

Let us find the difference between the virtual memory, which is 532 MB and twice the mapped memory, which is 2 * 192 = 384 MB. The difference between these figures is 148 MB (532 - 384).

What we see next is the portion of virtual memory that is not mapped. This value is the same as what we just calculated.

How it works…

As mentioned earlier, a high or low value for nonmapped memory is not defined; however, when the value reaches GBs, we might have to investigate, if the possible number of open connections is high, and check if there is a leak with client applications not closing them after using it. There is a graph that gives us the number of connections open and it looks as follows:

How it works…

Once we know the number of connections and find it too high as compared to the normal expected count, we will need to find the clients who have opened the connections to that instance. We can execute the following JavaScript code from the shell to get those details. Unfortunately, at the time of writing this book, MMS didn't have this feature to list out the client connection details.

testMon:PRIMARY> var currentOps = db.currentOp(true).inprog;
currentOps.forEach(function(c) {
  if(c.hasOwnProperty('client')) {
    print('Client: ' + c.client + ", connection id is: " + c.desc);
  }
  //Get other details as needed 
});

The db.currentOp method returns all the idle and system operations in the result. We then iterate through all the results and print out the client host and the connection details. A typical document in the result of the currentOp method looks like the following code snippet. You may choose to tweak the preceding piece of code to include more details according to your needs.

        {
                "opid" : 62052485,
                "active" : false,
                "op" : "query",
                "ns" : "",
                "query" : {
                        "replSetGetStatus" : 1,
                        "forShell" : 1
                },
                "client" : "127.0.0.1:64460",
                "desc" : "conn3651",
                "connectionId" : 3651,
                "waitingForLock" : false,
                "numYields" : 0,
                "lockStats" : {
                        "timeLockedMicros" : {

                        },
                        "timeAcquiringMicros" : {

                        }
                }
        }

The Understanding the mongostat and mongotop utilities recipe in Chapter 4, Administration, was used to get some details on the percentage of time for which a database was locked, and the number of update, insert, delete, and getmore operations executed per second. You may refer to this recipe and try it out. We used the same JavaScript that we have used currently to keep the server busy.

In the MMS console, we have similar graphs giving these details as follows:

How it works…

The first one, Opcounters, shows the number of operations executed as of a particular point in time. This should be similar to what we saw using the mongostat utility. Similarly, the one on the right shows us the percentage of time for which a DB was locked. The previous dropdown lists out the database names; we can select an appropriate database for which we want to see the stats. Again, this statistic can be seen using the mongostat utility. The only difference is, with the command-line utility, we see the stats as of the current time whereas here, we see the historical stats as well.

In MongoDB, indexes are stored in B-trees, and the following graph shows the number of times the B-tree index was accessed, hit, and missed. At the minimum, the RAM should be enough to accommodate the indexes for optimum performance; so in metrics, the misses should be zero or very low. A high number of misses results in a page fault for the index and possibly, additional page faults for the corresponding data, if the query is not covered; all its data cannot be sourced from the index, which is a double blow for its performance. One good practice, whenever querying, is to use projections and fetch only the necessary fields from the document. This is helpful whenever we have our selected fields present in an index, in which case, the query is covered and all the necessary data is sourced only from the index.

To find out more about covered indexes, refer to the Creating an index and viewing plans of queries recipe in Chapter 2, Command-line Operations and Indexes.

How it works…

For busy applications, when MongoDB acquires a lock on the database, other read and write operations get queued up. If the volumes are very high with multiple write and read operations contending for lock, the operations queue up. Until version 2.4 of MongoDB, the locks are at database level; thus, even if the writes are happening on another collection, read operations on any collection in that database will block. This queuing operation affects the performance of the system and is a good indicator that the data might need to be sharded across to scale the system.

How it works…

Tip

Remember, no value is defined as high or low; it is an acceptable value based on an application to application basis.

MongoDB flushes the data immediately from the journal and periodically from the data file to the disk. The following metrics give us the flush time per minute at a given point in time. If the flush takes up a significant percentage of the time per minute, we can safely say that the write operations are forming a bottleneck for the performance.

How it works…

There's more…

We have seen monitoring of the MongoDB instances/cluster in this recipe. However, setting up alerts to be notified when certain threshold values are crossed, is what we still haven't seen. In the next recipe, we will see how to achieve this with a sample alert, which is sent out over an e-mail when the page faults cross a predetermined value.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset