This recipe is all about executing basic query and insert operations using PyMongo. This is similar to what we did with the Mongo shell earlier in the book.
To execute simple queries, we need to have a server up and running. A simple single node is what we will need. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, to learn how to start the server. The data on which we will operate needs to be imported in the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. Python is expected to be installed on the host operating system and Mongo's client for python, PyMongo, needs to be installed. Look at the previous recipe to know how to install PyMongo for your host operating system. Also, in this recipe, we will execute insert operations and provide a write concern to use.
Let's start with some querying for Mongo from the Python shell. This will be identical to what we do from the Mongo shell, except that this is in the Python programming language as opposed to JavaScript that we have in the Mongo shell. We can use the basics that we will see here to develop large scale production systems that run on Python and use MongoDB as a data store.
Let's get started by first starting the Python shell from the operating system's command prompt. The following steps are independent of the host operating system:
$ python Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>>
pymongo
package and create the client as follows:>>> import pymongo >>> client = pymongo.MongoClient('localhost', 27017)
An alternative way to connect is as follows:
>>> client = pymongo.MongoClient('mongodb://localhost:27017')
getDatabase()
method to get an instance of the database, we will get a reference to the database object on which we will perform the operations (test
in this case). We will do this in the following way:>>> db = client.test
Another alternative way is as follows:
>>> db = client['test']
postalCodes
collection. We will limit our results to 10 items as follows:>>> postCodes = db.postalCodes.find().limit(10)
for
statement. The following fragment should print 10 documents that are returned:>>> for postCode in postCodes: print 'City: ', postCode['city'], ', State: ', postCode['state'], ', Pin Code: ', postCode['pincode']
>>> postCode = db.postalCodes.find_one()
>>> print 'City: ', postCode['city'], ', State: ', postCode['state'], ', Pin Code: ', postCode['pincode']
>>> cursor = db.postalCodes.find({'state':'Gujarat'}, {'_id':0, 'city':1, 'state':1, 'pincode':1}).sort('city', pymongo.ASCENDING).limit(10)
The preceding cursor's results can be printed in the same way in which we printed the results in step 5.
>>> city = db.postalCodes.find().sort([('state', pymongo.DESCENDING),('city',pymongo.ASCENDING)]).limit(5)
insert
operation. We will use a test
collection to perform these operations and not disturb our postal codes test data. We will use a pymongoTest
collection for this purpose and add documents in a loop to it as follows:>>> for i in range(1, 21): db.pymongoTest.insert({'i':i})
insert
operation can take a list of dictionary objects and perform a bulk insert. So now, something like the following insert
query is perfectly valid:>>> db.pythonTest.insert([{'name':'John'}, {'name':'Mark'}])
Any guesses on the return value? In the case of a single document insert, the return value is the value of _id
for the newly created document. In this case, it is a list of IDs.
insert
query again, this time, with a write concern provided. Execute the following write concern with w = 1
and j = True
:>>> db.pymongoTest.insert({'name': 'Jones'}, w = 1, j = True)
We instantiated the client and then got the reference to the object that will be used to access the database on which we wish to perform operations in step 3. There are a couple of ways to get this reference. The first option (db = client.test
) is more convenient, unless your database name has a special character, such as a hyphen (-). For example, if the name is db-test
, we would have no option other than to use the []
operator to access the database. Using either of the alternatives, we now have an object for the test
database in the db
variable. After we got the client and the db
instance in Python, we queried to find the top 10 documents in the natural order from the collection in step 4. The syntax is exactly identical to how this query would have been executed from the shell. Step 5 simply printed out the results, 10 of them in this case. Generally, if you need instant help on a particular class using the class name or an instance of this class from the Python interpreter, simply execute dir(<class_name>)
or dir(<object of a class>)
; which gives a listing of the attributes and functions defined in the module passed. For example, dir('pymongo.MongoClient')
or dir(client)
, where client
is the variable that holds the reference to an instance of pymongo.MongoClient
, can be used to get the listing of all the supported attributes and functions. The help
function is more informative and prints out the module's documentation, which is a great source of reference just in case you need instant help. Try typing in help('pymongo.MongoClient')
or help(client)
.
In steps 4 and 5, we queried the postalCodes
collection, limited the result to the top 10 results, and printed them. The returned object is of type pymongo.cursor.Cursor
class. The next step got just one document from the collection using the find_one()
function. This is synonymous to the findOne()
method on the collection invoked from the shell. The value returned by this function is an inbuilt dict
object.
In step 8, we executed another find to query the data. However, this time around, we passed two parameters to it. The first one was the query, which looked similar to how we execute from the Mongo shell. However, the type of the parameter in Python is dict
. The second parameter was another object of type dict
. This dictionary is used to provide the fields to be returned in the result. A value 1
for a field indicates that the value is to be selected and returned in the result. This is synonymous to select
in the relational database, with a few sets of columns provided explicitly to be selected. The _id
field is selected by default, unless it is explicitly set to 0
in the selector dict
object. The selector provided here is {'_id':0, 'city':1, 'state':1, 'pincode':1}
, which selects the city, state, and pin code and suppresses the _id
field. We have the sort
method too. This method has two formats: sort(sort_field, sort_direction)
and sort([(sort_field, sort_direction)…(sort_field, sort_direction)])
.
The first one is used when we want to sort by one field only. The second representation accepts a list of pairs of sort fields and sort directions and is used when we want to sort by multiple fields. We used the first format in the query in step 8 and the second format in our query in step 9, as we sorted first by state name and then by city.
If we look at the way we invoked sort, it was invoked on the cursor
instance. Similarly, the limit
function was also on the Cursor
class. The evaluation is lazy and is deferred until the iteration is performed to retrieve the results from the cursor. Until that point, the cursor
object is not evaluated on the server.
In step 12, we inserted a document 20 times in a collection. Each insert, as we see in the Python shell, will return a generated _id
field. In terms of the syntax of insert, it is exactly identical to the operation we perform from the shell. The parameter passed for the insert
operation is again an object of type dict
.
In step 13, we passed a list of documents to insert in the collection. This inserts multiple documents in one call to the server; this is a bulk insert. The return value in this case is a list of IDs, one for each document inserted and in the same order as passed in the input list. However, as MongoDB doesn't support transactions, all inserts will be independent of each other, and a failure of one insert doesn't automatically roll back the entire operation.
Adding to the functionality to insert multiple documents demanded another parameter for the behavior. When one of the inserts in the given list fails, should the remaining inserts continue or should the insertion stop as soon as the first error is encountered? The name of the parameter to control this behavior is continue_on_error
, and its default value is False
, that is, stop as soon as the first error is encountered. If this value is True
and multiple errors occur during insertion, only the latest error will be available. Hence, the default option is False
, as the value is sensible. Let's take a look at a couple of examples. In the Python shell, execute the following commands:
>>> db.contOnError.drop() >>> db.contOnError.insert([{'_id':1}, {'_id':1}, {'_id':2}, {'_id':2}]) >>> db.contOnError.count()
The count we will get is 1
, which is for the first document with the _id
field as 1
. The moment another document with the same value of the _id
field is found, 1
in this case, an error is thrown, and the bulk insert stops. Now, execute the following insert
operation:
>>> db.contOnError.drop() >>> db.contOnError.insert([{'_id':1}, {'_id':1}, {'_id':2}, {'_id':2}], continue_on_error=True) >>> db.contOnError.count()
Here, we passed an additional parameter, continue_on_error
, whose value is True
. As a result of this parameter, the insert
operation will continue with the next document even if an intermediate insert
operation failed. The second insert with _id:1
fails; yet, the next insert goes through before another insert with _id:2
fails (as one document with this _id
is already present). Also, the error reported is for the last failure, the one with _id:2
in this case.
Another parameter is check_keys
, which checks for key names that start with $
and the existence of .
in the key. If one is found, it will raise bson.errors.InvalidDocument
. Thus, the following insert
operation will fail:
>>> db.pymongoTest.insert({'a.b':1})
By default, the check will take place, unless you explicitly disable it by setting the value of this parameter to False
. Thus, the following query will pass and return an object ID of the inserted document:
>>> db.pymongoTest.insert({'a.b':1}, check_keys=False)
Step 13 executed the insert
operation but provided a write operation to be used for the insert to be executed.