Executing update and delete operations using PyMongo

In the previous recipe, we saw how to execute find and insert operations in MongoDB using PyMongo. In this recipe, we will see how updates and deletions work from Python. We will also see what atomic find and update/delete is and how to execute these operations. We will then conclude by revisiting find operations and look at some interesting functions of the cursor object.

Getting ready

If you have already seen and completed the previous recipe, you are all set to go. If not, it is recommended that you first complete the previous recipe before going ahead with this recipe.

Before we get started, let's define a small function that iterates through the cursor and shows the results of a cursor on the console. We will use this function whenever we want to display the results of a query on the pymongoTests collection. The following is the function's body:

>>> def showResults(cursor):
          if cursor.count() != 0:
              for e in cursor:
                   print e
          else:
              print 'No documents found'

Also, refer to steps 1 and 2 in the previous recipe to learn how to create a connection to the MongoDB server and create the db object used to perform the CRUD operation on this database. Also, refer to step 11 in the previous recipe to learn how to insert the required test data in the pymongoTest collection. You might confirm the data in this collection by executing the following command from the Python shell once the data is present:

>>> showResults(db.pymongoTest.find())

For a part of the recipe, one is also expected to know and start a replica set instance. Refer to the Starting multiple instances as part of a replica set and Connecting to the replica set from the shell to query and insert data recipes in Chapter 1, Installing and Starting the MongoDB Server.

How to do it…

  1. We will set a field named gtTen, specifying with a Boolean value True if the field i has a value greater than 10. Let's execute the following update command:
    >>> db.pymongoTest.update({'i':{'$gt':10}}, {'$set':{'gtTen':True}})
    {u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1}
    
  2. Query the collection and view its data by executing the following command, and check the data that got updated:
    >>> showResults(db.pymongoTest.find())
    
  3. The results displayed confirm that only one document got updated. We will now execute the same update again but, this time around, we will update all the documents that match the provided query. Execute the following update operation from the Python shell. Note that this update is identical to the one we performed in step 1, except for the additional parameter called multi whose value is given as True. Also, note the value of n in the response; it is 10 this time:
    >>> db.pymongoTest.update({'i':{'$gt':10}},{'$set':{'gtTen':True}}, multi=True)
    {u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 10}
    
  4. Execute the operation we performed in step 2 again to view the contents in the pymongoTest collection and verify the documents updated.
  5. Let's take a look at how upsert operations can be performed. Upserts are updates plus inserts. They update a document if one exists, just as an update will do; otherwise, it will insert a new document. Let's take a look at an example. Consider the following command on a document that doesn't exist in the collection:
    >>> db.pymongoTest.update({'i':21},{'$set':{'gtTen':True}})
    
  6. The update here will not update anything and will return the number of updated documents as 0. However, let's consider that we want to update a document if it exists or insert a new document and apply the update on it atomically and then perform an upsert operation. In this case, the upsert operation is executed as follows (note the response that mentions upsert, ObjectId of the newly inserted document, and the updatedExisting value, which is False):
    >>> db.pymongoTest.update({'i':21},{'$set':{'gtTen':True}}, upsert=True)
    {u'ok': 1.0, u'upserted': ObjectId('52a8b2f47a809beb067ecd8a'), u'err': None, u'connectionId': 8, u'n': 1, u'updatedExisting': False}
    
  7. Let's see how to delete documents from the collection using the remove method:
    >>> db.pymongoTest.remove({'i':21})
    {u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1}
    
  8. If we look at the value of n in the preceding response, we will see that it is 1. This means that one document got removed. There is another way to remove the document by _id. Let's insert one document in the collection and later remove it. Insert the document as follows:
    >>> db.pymongoTest.insert({'i':23, '_id':23})
    
  9. Now, remove this document from the collection as follows:
    >>> db.pymongoTest.remove(23)
    {u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1}
    
  10. We will look at the find and modify operations now. We can look at this operation as a way to find a document and then update/remove it; both of these operations are performed atomically. Once the operation is performed, the document returned is either the one before or after the update operation was done (in the case of remove, there will be no document after the operation). In the absence of this operation, we cannot guarantee an atomic find, update the document, and return the resulting document before/after the update in scenarios where multiple client connections could be performing similar operations on the same document. The following is an example of how to perform these find and modify operations in Python:
    >>> db.pymongoTest.find_and_modify({'i':20}, {'$set':{'inWords':'Twenty'}})
    {u'i': 20, u'gtTen': True, u'_id': ObjectId('52a8a1eb072f651578ed98b2')}
    

    The preceding result shows us that the resulting document returned is the one before the update was applied.

  11. Execute the following find operation to query and view the document that we updated in the previous step. The resulting document will contain the newly added inWords field:
    >>> db.pymongoTest.find_one({'i':20})
    {u'i': 20, u'_id': ObjectId('52aa0cfe072f651578ed98b7'), u'inWords': u'Twenty'}
    
  12. We will execute the find and modify operations again but, this time around, we will return the updated document rather than the document before the update, which we saw in step 9. Execute the following command from the Python shell:
    >>> db.pymongoTest.find_and_modify({'i':19}, {'$set':{'inWords':'Nineteen'}}, new=True)
    {u'i': 19, u'gtTen': True, u'_id': ObjectId('52a8a1eb072f651578ed98b1'), u'inWords': u'Nineteen'}
    
  13. We saw how to query using PyMongo in the previous recipe. Here, we will continue with the query operation. We saw how the sort and limit functions were chained to the find operation. The prototype of the call on the postalCodes collection is as follows:
    db.postalCode.find(..).limit(..).sort(..)
    
  14. There is an alternate way that achieves the same result as the one achieved earlier. Execute the following query in the Python shell to achieve the same result:
    >>> cursor = db.postalCodes.find({'state':'Gujarat'}, {'_id':0, 'city':1, 'state':1, 'pincode':1}, limit=10, sort=[('city', pymongo.ASCENDING)])
    
  15. Print the preceding cursor using the showResult function that is already defined.
  16. To restrict a full table scan on the collection by queries without indexes, there is a parameter called max_scan, which takes an integer value. This value of the max_scan parameter ensures that a query doesn't scan more than the value provided. For instance, the following query ensures that no more than 50 documents are scanned to get the results. Again, use the showResults function to display the results in the cursor:
    >>> showResults(db.postalCodes.find({'state':'Andhra Pradesh'}, max_scan=50))
    

How it works…

Let's take a look at what we did in this recipe. We started by updating the documents in a collection in step 1. The update, however, updated only the first matching document by default and the rest of the matching documents were not updated. In step 2, we added a parameter called multi with a value True to update multiple documents as part of the same update operation. Note that all these documents are not updated atomically as part of one transaction. If we look at the update done from the Python shell, we will see a striking resemblance to what we would have done from the Mongo shell. If we want to name the arguments of the update operation, the names of the parameter are called spec and document, which are for the document provided as a query to be used to select the documents and to update documents respectively. For instance, the following update operation is valid:

>>> db.pymongoTest.update(spec={'i':{'$gt':10}},document= {'$set':{'gtTen':True}})

There are some more arguments that an update function takes, with most of them carrying the same meaning as the insert function we saw in the previous recipe. These parameters are w, wtimeout, j, fsync, and check_keys. Refer to the previous recipe for the explanation given for these parameters used with the insert function.

In step 6, we did an upsert (update plus insert). All we had was an additional upsert parameter with the value as True. However, what exactly happens in the case of an upsert? Mongo tries to update the document that matches the provided condition; if it finds one, this will have been a regular update. However, in this case (upsert in step 6), the document was not found. The server inserted the document given as spec (the first parameter) in the collection and then applied an update operation on it with both these operations taking place atomically.

In steps 7 and 8, we saw the remove operation. The first variant accepted a query and all the matching documents were removed. The second variant, in step 8, accepted one integer, which is the value of the _id field to be deleted. This variant is useful whenever we plan to delete by the _id field's value. Similar to update, the remove function too accepts other parameters for the write concern. The w, wtimeout, j, and fsync parameters have meanings similar to what we discussed in the previous recipe when we inserted the documents. Refer to the previous recipe for a detailed description of these parameters. The call to the remove method on the collection without any parameter will remove all the documents in the collection.

In steps 10 to 12, we executed the find and modify operations. Information on these operations is provided in the previous section. What we didn't see is that this operation can also be used to find and remove documents from the collection. An additional parameter called remove needs to be added with the value as True. In the following operation, we will remove the document with _id equals 31 and return the document before deleting it:

>>> db.pymongoTest.find_and_modify(query={'_id':31}, remove=True)

Note that, with the remove option provided, the parameter named new is not supported, as there is nothing to return after the document is deleted.

All the operations we saw in this recipe were for the clients connected to a standalone instance. If, however, you are connected to a replica set, the client is instantiated in a different way. Also, we are aware of the fact that, by default, we are not allowed to query the secondary nodes for data. We need to explicitly execute rs.slaveOk() from the Mongo shell connected to a secondary node to query it. This is done in a similar way from a Python client as well. If we are connected to a secondary node, we cannot query it by default, but the way in which we specify that we are ok to query on a secondary node is slightly different. There is a parameter called slave_okay to let us query from the secondary node whose value is False by default; if the value is True, the query will go through successfully and return results from a secondary node. If the parameter is not set to True, querying the secondary node will throw an exception that states that the node queried is not a master. For instance, if our client is connected to a secondary instance and we want to query it based on the name of the state, we will execute the following query:

>>> cursor = db.postalCodes.find({'state':'Maharashtra'}, slave_ok=True)

We will get the cursor for the results successfully if the collection does indeed have documents with the name of the state, Maharashtra.

Another parameter that is better left untouched and has a sensible default is called timeout, and its value by default is True. Note that this value is not a number for some sort of timeout but a Boolean value. If the value is True, the cursor opened by a query on the server will be auto-closed after 10 minutes of inactivity on it. Let's say, it is a sort of a garbage collection of the server-side resources. However, if this is set to False, it is no longer the responsibility of the server to clean it up, but the responsibility of the client to close it.

Another parameter called tailable is used to denote that the cursor returned by find is a tailable cursor. Explaining what tailable cursors are and giving more details is not in the scope of this recipe; this is explained in the Creating and tailing capped collection cursors in MongoDB recipe in Chapter 5, Advanced Operations.

So far in the recipe, we connected to a single node using pymongo.MongoClient. However, we cannot use the same class to connect to a replica set because of the following reasons:

  • We will just be connected to one instance
  • To allow us to perform write operations, we will have to connect to the primary instance
  • If the primary instance goes down, there has to be an automatic failover to the new primary instance

Therefore, to connect to a replica set and address the preceding three points, we will use pymongo.MongoReplicaSetClient. The following is the way in which we can initiate the client:

>>> client = pymongo.MongoReplicaSetClient('mongodb://localhost:27000', replicaSet='replSetTest')
>>> 

As we can see, we just provided one host from the replica set and the name of the replica set we used when starting it. The client will automatically discover the remaining hosts from the replica set configuration. The host name(s) that we provided is known as the seed list, using which we can provide multiple instances in the replica set. The name of the parameter that gives the host names is hosts_or_uri.

However, what about read preferences and how do we specify them? There are some more parameters that we will need to look at while initiating the client.

>>> from pymongo.read_preferences import ReadPreference
>>> from pymongo import MongoReplicaSetClient
>>> client = MongoReplicaSetClient('mongodb://localhost:27000', replicaSet='replSetTest', read_preference=ReadPreference.NEAREST)
>>> client.read_preference
4

The preceding steps initialized a replica set client with a read preference NEAREST. There is an additional parameter, secondary_acceptable_latency_ms, which gives the time in milliseconds. Now, this time will be used by the client to consider a member of the replica set as a contender for selection when the read preference NEAREST is specified. A minimum latency is first computed for all the replica set instances from the driver, and all the instances with a latency no more than the provided value will be added to the contender instances' list for selection as the nearest instance to the driver. There was a fairly long discussion on this behavior in the read preference recipe, and some code snippets from a Java client were used to explain the internals. The default value for this parameter is 15 milliseconds.

As we know, read preference can be provided at the client level, at the database level that gets inherited from the client, and also at the cursor level. By default, read_preference for a client initialized without an explicit read preference is PRIMARY (with the value 0). However, if we now get the database object from the client initialized earlier, the read preference will be NEAREST (with the value 4).

>>> db = client.test
>>> db.read_preference
4
>>>

Setting the read preference is as simple as executing the following command:

>>> db.read_preference = ReadPreference.PRIMARY_PREFERRED

Again, as the read preference gets inherited from the client to the database object, it gets inherited from the database object to the collection object, and it will be used as the default value for all the queries executed against that collection, unless read preference is specified explicitly in the find operation.

Thus, db.pymongoTest.find() will have a cursor, which uses the read preference as PRIMARY_PREFERRED (we just set it earlier to PRIMARY_PREFERRED at the database-object level) whereas db.pymongoTest.find(read_preference=ReadPreference.NEAREST) will use the read preference as NEAREST.

We will now wrap up the basic operations from a Python driver by trying to do some common operations that we do from the Mongo shell, such as getting all the database names, getting a list of collections in a database, and creating an index on a collection.

From the shell, we will execute show dbs to show all the database names in the Mongo instance that is connected. From the Python client, we will execute the following command on the client instance:

>>> client.database_names()
[u'local', u'test']

Similarly, to see the list of collections, we will type show collections in the Mongo shell. In Python, all that we will do on the database object is as follows:

>>> db.collection_names()
[u'system.indexes', u'writeConcernTest', u'pymongoTest']

Now, for index operations, we will first see what indexes are present in the pymongoTest collection. Execute the following command from the Python shell to view the indexes on a collection:

>>> db.pymongoTest.index_information()
{u'_id_': {u'key': [(u'_id', 1)], u'v': 1}}

We now will create an index on key x, which is sorted in ascending order on the pymongoTest collection as follows:

>>> db.pymongoTest.ensure_index([('x',pymongo.ASCENDING)])
u'x_1'

We can again list the indexes as follows to confirm the creation of the index:

>>> db.pymongoTest.index_information()
{u'_id_': {u'key': [(u'_id', 1)], u'v': 1}, u'x_1': {u'key': [(u'x', 1)], u'v':1}}

We can see that the index got created. Generally speaking, the format of the ensure_index method is as follows:

>>> db.<collection name>.ensure_index([(<field name 1>,<order of field 1>)….(<field name n >,<order of  field n>)])
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset