This recipe is all about executing basic query and insert
operations using PyMongo. This is similar to what we did with the Mongo shell earlier in the book.
To execute simple queries, we need to have a server up and running. A simple single node is what we need. Refer to the Installing single node MongoDB recipe from Chapter 1, Installing and Starting the Server for instructions on how to start the server. The data that we will be operating on needs to be imported in the database. The steps to import the data are given in the Creating test data recipe from Chapter 2, Command-line Operations and Indexes. Python 2.7, or higher, has to be present on the host operating system along with MongoDB's Python client, PyMongo. Look at the earlier recipe, Connecting to a single node using a Python client, in Chapter 1, Installing and Starting the Server on how to install PyMongo for your host operating system. Additionally, in this recipe, we will execute insert
operations and provide a write concern to use.
Let's start with querying for Mongo in the Python shell. This will be identical to what we do in the mongo shell except that this is in the Python programming language, as opposed to the JavaScript that we have in the mongo shell. We can use the basics that we will see here to write big production systems that run on Python and use mongo as a data store.
Let's begin by starting the Python shell from the operating system's command prompt. All these steps are independent of the host operating system. Perform the following steps:
$ python Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>
pymongo
package and create the client as follows:>>> import pymongo >>> client = pymongo.MongoClient('localhost', 27017) The following is an alternative way to connect >>> client = pymongo.MongoClient('mongodb://localhost:27017')
getDatabase()
method to get an instance of the database. We will get a reference to the database object that we will be performing the operations on, test
in this case. We will do this in the following way:>>> db = client.test Another alternative is >>> db = client['test']
postalCodes
collection. We will limit our results to 10 items.>>> postCodes = db.postalCodes.find().limit(10)
for
statement. The following fragment should print 10 documents as returned:>>> for postCode in postCodes: print 'City: ', postCode['city'], ', State: ', postCode['state'], ', Pin Code: ', postCode['pincode']
>>> postCode = db.postalCodes.find_one()
state
and city
of the returned result as follows:>>> print 'City: ', postCode['city'], ', State: ', postCode['state'], ', Pin Code: ', postCode['pincode']
city
, state
, and pincode
. Execute the following query in the Python shell:>>> cursor = db.postalCodes.find({'state':'Gujarat'}, {'_id':0, 'city':1, 'state':1, 'pincode':1}).sort('city', pymongo.ASCENDING).limit(10)
The preceding cursor's results can be printed in the same way that we printed the results in step 5.
>>> city = db.postalCodes.find().sort([('state', pymongo.DESCENDING),('city',pymongo.ASCENDING)]).limit(5)
insert
operation. We will use a test collection to perform these operations and not disturb our postal codes test data. We will use a pymongoTest
collection for this purpose and add documents in a loop to it as follows:>>> for i in range(1, 21): db.pymongoTest.insert_one({'i':i})
insert
can take a list of dictionary objects and perform a bulk insert. So now, something similar to the following insert
is perfectly valid:>>> db.pythonTest.insert_many([{'name':'John'}, {'name':'Mark'}])
Any guesses on the return value? In case of a single document insert, the return value is the value of _id
for the newly created document. In this case, it is a list of IDs.
In step 2, we instantiate the client and get the reference to the MongoClient
object that will be used to access the database. There are a couple of ways to get this reference. The first option is more convenient, unless your database name has some special character, such as a hyphen (-). For example, if the name is db-test
, we would have no option other than to use the []
operator to access the database. Using either of the alternatives, we now have an object for the test database in the db
variable. After we get the client
and db
instances in Python, we query to find the top 10 documents in the natural order from the collection in step 3. The syntax is identical to how this query would have been executed in the shell. Step 4 simply prints out the results, 10 of them in this case. Generally, if you need instant help on a particular class using the class name or an instance of this class from the Python interpreter, simply perform dir(<class_name>)
or dir(<object of a class>)
, which gives you a list of attributes and functions defined in the module passed. For example, dir('pymongo.MongoClient')
or dir(client)
, where the client is the variable holding reference to an instance of pymongo.MongoClient
, can be used to get the list of all the supported attributes and functions. The help
function is more informative, prints out the module's documentation, and is a great source of reference just in case you need instant help. Try typing help('pymongo.MongoClient')
or help(client)
.
In steps 3 and 4, we query the postalCodes
collection, limit the result to the top 10 results, and print them. The returned object is of a type pymongo.cursor.Cursor
class. The next step gets just one document from the collection using the find_one()
function. This is synonymous to the findOne()
method on the collection invoked in the shell. The value returned by this function is an inbuilt object, dict
.
In step 6, we execute another find
to query the data. In step 8, we pass two Python dicts. The first dict is the query, similar to the query parameter we use in mongo shell. The second dictionary is used to provide the fields to be returned in the result. A value, one, for a field indicates that the value is to be selected and returned in the result. This is synonymous with the select
statement in a relational database with a few sets of columns provided explicitly to be selected. The _id
field is selected by default unless it is explicitly set to zero in the selector dict
object. The selector provided here is {'_id':0, 'city':1, 'state':1, 'pincode':1}
, which selects the city, state, and pincode and suppresses the _id
field. We have a sort method as well. This method has two formats as follows:
sort(sort_field, sort_direction) sort([(sort_field, sort_direction)…(sort_field, sort_direction)])
The first one is used when we want to sort by one field only. The second representation accepts a list of pairs of the sort field and sort directions and is used when we want to sort by multiple fields. We used the first form in the query in step 8 and the second format in our query in step 9 as we sort first by the state name
and then, by city
.
If we look at the way we invoke sort
, it is invoked on the Cursor
instance. Similarly, the limit
function is also on the Cursor
class. The evaluation is lazy and deferred until the iteration is performed in order to retrieve the results from the cursor. Until this point of time, the Cursor
object is not evaluated on the server.
In step 11, we insert a document 20 times in a collection. Each insert, as we can see in the Python shell, will return a generated _id
field. In terms of the syntax of insert, it is exactly identical to the operation that we perform in the shell. The parameter passed for the insert is an object of type dict
.
In step 12, we pass a list of documents to insert in the collection. This is referred to as a bulk insert operation, which inserts multiple documents in a single call to the server. The return value in this case is a list of IDs, one for each document inserted, and the order is the same as those passed in the input list. However, as MongoDB doesn't support transactions, each insert will be independent of each other, and a failure of one insert doesn't roll back the entire operation automatically.
Adding the functionality of inserting multiple documents demanded another parameter for the behavior. When one of the inserts in the list given fails, should the remaining inserts continue or the insertion stop as soon as the first error is encountered? The name of the parameter to control this behavior is continue_on_error
and its default value is False
, that is, stop as soon as the first error is encountered. If this value is True
and multiple errors occur during insertion, only the latest error will be available, and hence the default option with False
as the value is sensible. Let's look at a couple of examples. In the Python shell, execute the following:
>>> db.contOnError.drop() >>> db.contOnError.insert([{'_id':1}, {'_id':1}, {'_id':2}, {'_id':2}]) >>> db.contOnError.count()
The count that we will get is 1
, which is for the first document with the _id
field as 1
. The moment another document with the same value of the _id
field is found, 1
in this case, an error is thrown and the bulk insert stops. Now execute the following insert
operation:
>>> db.contOnError.drop() >>> db.contOnError.insert([{'_id':1}, {'_id':1}, {'_id':2}, {'_id':2}], continue_on_error=True) >>> db.contOnError.count()
Here, we have passed an additional parameter, continue_on_error
, whose value is True
. What this does is ensures that the insert
operation will continue with the next document even if an intermediate insert
operation fails. The second insert with _id:1
fails, yet the next insert goes through before another insert with _id:2
fails (as one document with this _id
is already present). Additionally, the error reported is for the last failure, the one with _id:2
.