Viewing and killing the currently executing operations

In this recipe, we will see how to view the current running operations and kill some operations that have been running for a long time.

Getting ready

In this recipe, we will simulate some operations on a standalone Mongo instance. We need to start a standalone server listening to any port for client connections; in this case, we will stick to the default 27017. If you are not aware of how to start a standalone server, refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server. We also need to start two shells connected to the server started. One shell will be used for background index creation, and the other will be used to monitor the current operation and then kill it.

How to do it…

Unlike in our test environment, we will not be able to simulate the actual long-running operation. We, however, will try to create an index and hope it takes a long time to create. Depending on your target hardware configuration, the operation may take some time. Let's see the steps in detail:

  1. To start this test, let us execute the following command on the Mongo shell:
    > db.currentOpTest.drop()
    > for(i = 1 ; i < 10000000 ; i++) { db.currentOpTest.insert({'i':i})}
    

    The preceding insertion might take some time to insert 10 million documents.

    Once the documents are inserted, we will execute an operation that will create the index in the background. If you would like to know more about index creation, refer to the recipe Background and foreground index creation from the shell in Chapter 2, Command-line Operations and Indexes, but it is not a prerequisite for this recipe.

  2. Create a background index on the i field in the document. This index-creation operation is what we will be viewing from the currentOp operation and is what we will attempt to kill by using the kill operation. Execute the following command in one shell to initiate the background index creation operation:
    > db.currentOpTest.ensureIndex({i:1}, {background:1})
    

    This takes a fairly long time; on my laptop, it took well over 100 seconds

  3. In the second shell, execute the following command to get the current executing operations:
    > db.currentOp().inprog
    
  4. Take a note of the in-progress operations and find the one for index creation. In our case, on the test machine, it was the only one in progress. It will be an operation on system.indexes and the operation will be insert. The keys to look out for in the output document are ns and op respectively. We need to note the first field, namely opid, of this operation. In this case, it is 11587458. The sample output of the command is given in the next section.
  5. Kill the operation from the shell using opid, which we got earlier:
    > db.killOp(11587458)
    

How it works…

We will split our explanation into two sections, the first about the current operation details and the second about killing the operation.

The index creation process, in our case, is the long-running operation that we intend to kill. We create a big collection with about 10 million documents, and initiate a background index creation process.

On executing the db.currentOp() operation, we get a document as the result, with an inprog field whose value is an array of other documents, each representing a currently running operation. It is common to get a big list of documents on a busy system. The following is a document taken for the index creation operation:

{
  "opid" : 11587458,
  "active" : true,
  "secs_running" : 31,
  "op" : "insert",
  "ns" : "test.system.indexes",
  "insert" : {
    "v" : 1,
    "key" : {
      "i" : 1
    },
    "ns" : "test.currentOpTest",
    "name" : "i_1",
    "background" : 1
  },
  "client" : "127.0.0.1:50895",
  "desc" : "conn10",
  "connectionId" : 10,
  "locks" : {
    "^" : "w",
    "^test" : "W"
  },
  "waitingForLock" : false,
  "msg" : "bg index build Background Index Build Progress: 2214738/10586935 20%",
  "progress" : {
    "done" : 2214740,
    "total" : 10586935
  },
  "numYields" : 3070,
  "lockStats" : {
    "timeLockedMicros" : {
      "r" : NumberLong(0),
      "w" : NumberLong(53831938)
    },
    "timeAcquiringMicros" : {
      "r" : NumberLong(0),
      "w" : NumberLong(31387832)
    }
  }
}

We will see what these fields mean in the following table:

Field

Description

opid

This is a unique operation ID identifying the operation. This is the ID to be used to kill an operation.

active

The Boolean value indicates whether the operation has started or not. It is false only if it is waiting to acquire the lock to execute the operation. The value will be true once it starts, even if at a point of time where it has yielded the lock and is not executing.

secs_running

This gives the time the operation is executing for in seconds.

op

This indicates the type of the operation. In the case of index creation, it is inserted into a system collection of indexes. The possible values are insert, query, getmore, update, remove, and command.

ns

This is a fully qualified namespace for the target. It will be of the <database name>.<collection name> form.

insert

This shows the document that will be inserted in the collection.

query

This is a field that will be present for operations other than the insert and getmore commands.

client

This is the IP address/hostname and the port of the client who initiated the operation.

desc

This is the description of the client, mostly the client's connection name.

connectionId

This is the identifier of the client connection from which the request originated.

locks

This is a document containing the locks held for this operation. The document shows the locks held for the operation being analyzed for various databases. The ^ indicates global lock and ^test indicates the lock on the test database. The values here are interesting. The value of ^ is w (lower case). This means that it is not an exclusive write lock, and multiple databases can write concurrently. It is a lock held at the database level. ^test has a value W, which is a global write lock. This means that the write lock on the test database is exclusive and no other operation on any database can occur when this lock is held. The preceding output is for Version 2.4 of Mongo.

waitingForLock

This field indicates whether the operation is waiting for a lock to be acquired. For instance, if the preceding index creation was not a background process, other operations on this database would queue up for the lock to be acquired. This flag for those operations will then be true.

msg

This is a human-readable message for the operation. In this case, we do see a percentage of operation complete, as this is an index creation operation.

progress

This is the state of the operation. The total gives the total number of documents in the collection and done gives the numbers indexed so far. In this case, the collection already had some more documents (over 10 million documents). The percentage of operation completed is computed from these figures.

numYields

This is the number of times the process has yielded the lock to allow other operations to execute. As this is a background index creation process, this number will keep on increasing as the server yields it frequently to let other operations execute. Had it been a foreground process, the lock would never be yielded till the operation completes.

lockStats

This document has more nested documents giving stats of the total time this operation has held the read or write lock, and also the time it waited to acquire the lock. The following are the possible values:

  • r: This is the time locked for a specific (database level) read lock
  • w: This is the time locked for a specific (database level) write lock
  • R: This is the time locked for global read lock
  • W: This is the time locked for global write lock

If you have a replica set, there will be many more getmore operations on oplog on the primary from secondary.

To see if system operations are executed, we need to pass a true value as the parameter to the currentOp function call as follows:

> db.currentOp(true)

Next, we will see how to kill the user-initiated operation using the killOp function. The operation is simply called as follows:

> db.killOp(<operation id>)

In our case, the index creation process had the process ID 11587458 and thus it will be killed as follows:

> db.killOp(11587458)

On killing any operation, irrespective of whether the given operation ID exists or not, we see the following message on the console:

{ "info" : "attempting to kill op" }

Thus, seeing this message doesn't mean that the operation was killed. It just means that the operation, if it exists, will be attempted.

If an operation cannot be killed immediately and if the killOp command is issued for it, the killPending field in currentOp will start appearing for the given operation. For example, execute the following query on the shell:

> db.currentOpTest.find({$where:'sleep(100000)'})

This will not return, and the thread executing the query will sleep for 100 seconds. This is an operation that cannot be killed using killOp. Try executing currentOp from another shell (do not tab for auto completion; your shell may just hang), get the operation ID, and then kill it using the killOp command. You should see that the process will still be running if you execute the currentOp command, but the document for the process details will now contain a new key, killPending, stating that the kill for this operation is requested but pending.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset