In the previous recipe, we saw how to execute find
and insert
operations in MongoDB using PyMongo. In this recipe, we will see how updates and deletions work from Python. We will also see what atomic find and update/delete is and how to execute these operations. We will then conclude by revisiting find
operations and look at some interesting functions of the cursor
object.
If you have already seen and completed the previous recipe, you are all set to go. If not, it is recommended that you first complete the previous recipe before going ahead with this recipe.
Before we get started, let's define a small function that iterates through the cursor and shows the results of a cursor on the console. We will use this function whenever we want to display the results of a query on the pymongoTests
collection. The following is the function's body:
>>> def showResults(cursor):
if cursor.count() != 0:
for e in cursor:
print e
else:
print 'No documents found'
Also, refer to steps 1 and 2 in the previous recipe to learn how to create a connection to the MongoDB server and create the db
object used to perform the CRUD operation on this database. Also, refer to step 11 in the previous recipe to learn how to insert the required test data in the pymongoTest
collection. You might confirm the data in this collection by executing the following command from the Python shell once the data is present:
>>> showResults(db.pymongoTest.find())
For a part of the recipe, one is also expected to know and start a replica set instance. Refer to the Starting multiple instances as part of a replica set and Connecting to the replica set from the shell to query and insert data recipes in Chapter 1, Installing and Starting the MongoDB Server.
gtTen
, specifying with a Boolean value True
if the field i
has a value greater than 10. Let's execute the following update
command:>>> db.pymongoTest.update({'i':{'$gt':10}}, {'$set':{'gtTen':True}}) {u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1}
>>> showResults(db.pymongoTest.find())
update
operation from the Python shell. Note that this update is identical to the one we performed in step 1, except for the additional parameter called multi
whose value is given as True
. Also, note the value of n
in the response; it is 10
this time:>>> db.pymongoTest.update({'i':{'$gt':10}},{'$set':{'gtTen':True}}, multi=True) {u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 10}
pymongoTest
collection and verify the documents updated.upsert
operations can be performed. Upserts are updates plus inserts. They update a document if one exists, just as an update will do; otherwise, it will insert a new document. Let's take a look at an example. Consider the following command on a document that doesn't exist in the collection:>>> db.pymongoTest.update({'i':21},{'$set':{'gtTen':True}})
0
. However, let's consider that we want to update a document if it exists or insert a new document and apply the update on it atomically and then perform an upsert
operation. In this case, the upsert
operation is executed as follows (note the response that mentions upsert
, ObjectId
of the newly inserted document, and the updatedExisting
value, which is False
):>>> db.pymongoTest.update({'i':21},{'$set':{'gtTen':True}}, upsert=True) {u'ok': 1.0, u'upserted': ObjectId('52a8b2f47a809beb067ecd8a'), u'err': None, u'connectionId': 8, u'n': 1, u'updatedExisting': False}
remove
method:>>> db.pymongoTest.remove({'i':21}) {u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1}
n
in the preceding response, we will see that it is 1
. This means that one document got removed. There is another way to remove the document by _id
. Let's insert one document in the collection and later remove it. Insert the document as follows:>>> db.pymongoTest.insert({'i':23, '_id':23})
>>> db.pymongoTest.remove(23) {u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1}
find
and modify
operations now. We can look at this operation as a way to find a document and then update/remove it; both of these operations are performed atomically. Once the operation is performed, the document returned is either the one before or after the update
operation was done (in the case of remove
, there will be no document after the operation). In the absence of this operation, we cannot guarantee an atomic find, update the document, and return the resulting document before/after the update in scenarios where multiple client connections could be performing similar operations on the same document. The following is an example of how to perform these find
and modify
operations in Python:>>> db.pymongoTest.find_and_modify({'i':20}, {'$set':{'inWords':'Twenty'}}) {u'i': 20, u'gtTen': True, u'_id': ObjectId('52a8a1eb072f651578ed98b2')}
The preceding result shows us that the resulting document returned is the one before the update was applied.
find
operation to query and view the document that we updated in the previous step. The resulting document will contain the newly added inWords
field:>>> db.pymongoTest.find_one({'i':20}) {u'i': 20, u'_id': ObjectId('52aa0cfe072f651578ed98b7'), u'inWords': u'Twenty'}
find
and modify
operations again but, this time around, we will return the updated document rather than the document before the update, which we saw in step 9. Execute the following command from the Python shell:>>> db.pymongoTest.find_and_modify({'i':19}, {'$set':{'inWords':'Nineteen'}}, new=True) {u'i': 19, u'gtTen': True, u'_id': ObjectId('52a8a1eb072f651578ed98b1'), u'inWords': u'Nineteen'}
sort
and limit
functions were chained to the find
operation. The prototype of the call on the postalCodes
collection is as follows:db.postalCode.find(..).limit(..).sort(..)
>>> cursor = db.postalCodes.find({'state':'Gujarat'}, {'_id':0, 'city':1, 'state':1, 'pincode':1}, limit=10, sort=[('city', pymongo.ASCENDING)])
showResult
function that is already defined.max_scan
, which takes an integer value. This value of the max_scan
parameter ensures that a query doesn't scan more than the value provided. For instance, the following query ensures that no more than 50 documents are scanned to get the results. Again, use the showResults
function to display the results in the cursor:>>> showResults(db.postalCodes.find({'state':'Andhra Pradesh'}, max_scan=50))
Let's take a look at what we did in this recipe. We started by updating the documents in a collection in step 1. The update, however, updated only the first matching document by default and the rest of the matching documents were not updated. In step 2, we added a parameter called multi
with a value True
to update multiple documents as part of the same update
operation. Note that all these documents are not updated atomically as part of one transaction. If we look at the update done from the Python shell, we will see a striking resemblance to what we would have done from the Mongo shell. If we want to name the arguments of the update
operation, the names of the parameter are called spec
and document
, which are for the document provided as a query to be used to select the documents and to update documents respectively. For instance, the following update
operation is valid:
>>> db.pymongoTest.update(spec={'i':{'$gt':10}},document= {'$set':{'gtTen':True}})
There are some more arguments that an update
function takes, with most of them carrying the same meaning as the insert
function we saw in the previous recipe. These parameters are w
, wtimeout
, j
, fsync
, and check_keys
. Refer to the previous recipe for the explanation given for these parameters used with the insert
function.
In step 6, we did an upsert (update plus insert). All we had was an additional upsert
parameter with the value as True
. However, what exactly happens in the case of an upsert? Mongo tries to update the document that matches the provided condition; if it finds one, this will have been a regular update. However, in this case (upsert in step 6), the document was not found. The server inserted the document given as spec
(the first parameter) in the collection and then applied an update
operation on it with both these operations taking place atomically.
In steps 7 and 8, we saw the remove
operation. The first variant accepted a query and all the matching documents were removed. The second variant, in step 8, accepted one integer, which is the value of the _id
field to be deleted. This variant is useful whenever we plan to delete by the _id
field's value. Similar to update
, the remove
function too accepts other parameters for the write concern. The w
, wtimeout
, j
, and fsync
parameters have meanings similar to what we discussed in the previous recipe when we inserted the documents. Refer to the previous recipe for a detailed description of these parameters. The call to the remove
method on the collection without any parameter will remove all the documents in the collection.
In steps 10 to 12, we executed the find
and modify
operations. Information on these operations is provided in the previous section. What we didn't see is that this operation can also be used to find and remove documents from the collection. An additional parameter called remove
needs to be added with the value as True
. In the following operation, we will remove the document with _id
equals 31
and return the document before deleting it:
>>> db.pymongoTest.find_and_modify(query={'_id':31}, remove=True)
Note that, with the remove
option provided, the parameter named new
is not supported, as there is nothing to return after the document is deleted.
All the operations we saw in this recipe were for the clients connected to a standalone instance. If, however, you are connected to a replica set, the client is instantiated in a different way. Also, we are aware of the fact that, by default, we are not allowed to query the secondary nodes for data. We need to explicitly execute rs.slaveOk()
from the Mongo shell connected to a secondary node to query it. This is done in a similar way from a Python client as well. If we are connected to a secondary node, we cannot query it by default, but the way in which we specify that we are ok to query on a secondary node is slightly different. There is a parameter called slave_okay
to let us query from the secondary node whose value is False
by default; if the value is True
, the query will go through successfully and return results from a secondary node. If the parameter is not set to True
, querying the secondary node will throw an exception that states that the node queried is not a master. For instance, if our client is connected to a secondary instance and we want to query it based on the name of the state, we will execute the following query:
>>> cursor = db.postalCodes.find({'state':'Maharashtra'}, slave_ok=True)
We will get the cursor for the results successfully if the collection does indeed have documents with the name of the state, Maharashtra
.
Another parameter that is better left untouched and has a sensible default is called timeout
, and its value by default is True
. Note that this value is not a number for some sort of timeout but a Boolean value. If the value is True
, the cursor opened by a query on the server will be auto-closed after 10 minutes of inactivity on it. Let's say, it is a sort of a garbage collection of the server-side resources. However, if this is set to False
, it is no longer the responsibility of the server to clean it up, but the responsibility of the client to close it.
Another parameter called tailable
is used to denote that the cursor returned by find
is a tailable cursor. Explaining what tailable cursors are and giving more details is not in the scope of this recipe; this is explained in the Creating and tailing capped collection cursors in MongoDB recipe in Chapter 5, Advanced Operations.
So far in the recipe, we connected to a single node using pymongo.MongoClient
. However, we cannot use the same class to connect to a replica set because of the following reasons:
Therefore, to connect to a replica set and address the preceding three points, we will use pymongo.MongoReplicaSetClient
. The following is the way in which we can initiate the client:
>>> client = pymongo.MongoReplicaSetClient('mongodb://localhost:27000', replicaSet='replSetTest') >>>
As we can see, we just provided one host from the replica set and the name of the replica set we used when starting it. The client will automatically discover the remaining hosts from the replica set configuration. The host name(s) that we provided is known as the seed list, using which we can provide multiple instances in the replica set. The name of the parameter that gives the host names is hosts_or_uri
.
However, what about read preferences and how do we specify them? There are some more parameters that we will need to look at while initiating the client.
>>> from pymongo.read_preferences import ReadPreference >>> from pymongo import MongoReplicaSetClient >>> client = MongoReplicaSetClient('mongodb://localhost:27000', replicaSet='replSetTest', read_preference=ReadPreference.NEAREST) >>> client.read_preference 4
The preceding steps initialized a replica set client with a read preference NEAREST
. There is an additional parameter, secondary_acceptable_latency_ms
, which gives the time in milliseconds. Now, this time will be used by the client to consider a member of the replica set as a contender for selection when the read preference NEAREST
is specified. A minimum latency is first computed for all the replica set instances from the driver, and all the instances with a latency no more than the provided value will be added to the contender instances' list for selection as the nearest instance to the driver. There was a fairly long discussion on this behavior in the read preference recipe, and some code snippets from a Java client were used to explain the internals. The default value for this parameter is 15 milliseconds.
As we know, read preference can be provided at the client level, at the database level that gets inherited from the client, and also at the cursor level. By default, read_preference
for a client initialized without an explicit read preference is PRIMARY
(with the value 0
). However, if we now get the database object from the client initialized earlier, the read preference will be NEAREST
(with the value 4
).
>>> db = client.test >>> db.read_preference 4 >>>
Setting the read preference is as simple as executing the following command:
>>> db.read_preference = ReadPreference.PRIMARY_PREFERRED
Again, as the read preference gets inherited from the client to the database object, it gets inherited from the database object to the collection object, and it will be used as the default value for all the queries executed against that collection, unless read preference is specified explicitly in the find
operation.
Thus, db.pymongoTest.find()
will have a cursor, which uses the read preference as PRIMARY_PREFERRED
(we just set it earlier to PRIMARY_PREFERRED
at the database-object level) whereas db.pymongoTest.find(read_preference=ReadPreference.NEAREST)
will use the read preference as NEAREST
.
We will now wrap up the basic operations from a Python driver by trying to do some common operations that we do from the Mongo shell, such as getting all the database names, getting a list of collections in a database, and creating an index on a collection.
From the shell, we will execute show dbs
to show all the database names in the Mongo instance that is connected. From the Python client, we will execute the following command on the client instance:
>>> client.database_names() [u'local', u'test']
Similarly, to see the list of collections, we will type show collections
in the Mongo shell. In Python, all that we will do on the database object is as follows:
>>> db.collection_names() [u'system.indexes', u'writeConcernTest', u'pymongoTest']
Now, for index operations, we will first see what indexes are present in the pymongoTest
collection. Execute the following command from the Python shell to view the indexes on a collection:
>>> db.pymongoTest.index_information() {u'_id_': {u'key': [(u'_id', 1)], u'v': 1}}
We now will create an index on key x
, which is sorted in ascending order on the pymongoTest
collection as follows:
>>> db.pymongoTest.ensure_index([('x',pymongo.ASCENDING)]) u'x_1'
We can again list the indexes as follows to confirm the creation of the index:
>>> db.pymongoTest.index_information() {u'_id_': {u'key': [(u'_id', 1)], u'v': 1}, u'x_1': {u'key': [(u'x', 1)], u'v':1}}
We can see that the index got created. Generally speaking, the format of the ensure_index
method is as follows:
>>> db.<collection name>.ensure_index([(<field name 1>,<order of field 1>)….(<field name n >,<order of field n>)])