Executing query and insert operations using PyMongo

This recipe is all about executing basic query and insert operations using PyMongo. This is similar to what we did with the Mongo shell earlier in the book.

Getting ready

To execute simple queries, we need to have a server up and running. A simple single node is what we will need. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, to learn how to start the server. The data on which we will operate needs to be imported in the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. Python is expected to be installed on the host operating system and Mongo's client for python, PyMongo, needs to be installed. Look at the previous recipe to know how to install PyMongo for your host operating system. Also, in this recipe, we will execute insert operations and provide a write concern to use.

How to do it…

Let's start with some querying for Mongo from the Python shell. This will be identical to what we do from the Mongo shell, except that this is in the Python programming language as opposed to JavaScript that we have in the Mongo shell. We can use the basics that we will see here to develop large scale production systems that run on Python and use MongoDB as a data store.

Let's get started by first starting the Python shell from the operating system's command prompt. The following steps are independent of the host operating system:

  1. Type in the following command in the shell, and the Python shell will start:
    $ python
    Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
    >>>
  2. Then, import the pymongo package and create the client as follows:
    >>> import pymongo
    >>> client = pymongo.MongoClient('localhost', 27017)
    

    An alternative way to connect is as follows:

    >>> client = pymongo.MongoClient('mongodb://localhost:27017')
    
  3. This works well too and achieves the same result. Now that we have the client, our next step is to get the database on which we will perform the operations. Now, unlike some programming languages where we have a getDatabase() method to get an instance of the database, we will get a reference to the database object on which we will perform the operations (test in this case). We will do this in the following way:
    >>> db = client.test
    

    Another alternative way is as follows:

    >>> db = client['test']
    
  4. We will query the postalCodes collection. We will limit our results to 10 items as follows:
    >>> postCodes = db.postalCodes.find().limit(10)
    
  5. Iterate over the results as follows. Watch out for the indentation of the print after the for statement. The following fragment should print 10 documents that are returned:
    >>> for postCode in postCodes: print 'City: ', postCode['city'], ', State: ', postCode['state'], ', Pin Code: ', postCode['pincode']
    
  6. To find one document, execute the following command:
    >>> postCode = db.postalCodes.find_one()
    
  7. Print the state and city of the returned result as follows:
    >>> print 'City: ', postCode['city'], ', State: ', postCode['state'], ', Pin Code: ', postCode['pincode']
    
  8. Let's query the top 10 cities in the state of Gujarat sorted by the name of the city, and we will just select the city, state, and the pin code. Execute the following query from the Python shell:
    >>> cursor = db.postalCodes.find({'state':'Gujarat'}, {'_id':0, 'city':1, 'state':1, 'pincode':1}).sort('city', pymongo.ASCENDING).limit(10)
    

    The preceding cursor's results can be printed in the same way in which we printed the results in step 5.

  9. Let's sort the data we query. We want to sort by the descending order of the state and then by the ascending order of the city. We will write the query as follows:
    >>> city = db.postalCodes.find().sort([('state', pymongo.DESCENDING),('city',pymongo.ASCENDING)]).limit(5)
    
  10. Iterate through this cursor; this should print out five results on the console. Refer to step 5 for how we iterate over a cursor returned to print the results.
  11. So, we played a bit to find documents and covered basic operations from Python as far as querying MongoDB is concerned. Now, let's see a bit about the insert operation. We will use a test collection to perform these operations and not disturb our postal codes test data. We will use a pymongoTest collection for this purpose and add documents in a loop to it as follows:
    >>> for i in range(1, 21): db.pymongoTest.insert({'i':i})
    
  12. The insert operation can take a list of dictionary objects and perform a bulk insert. So now, something like the following insert query is perfectly valid:
    >>> db.pythonTest.insert([{'name':'John'}, {'name':'Mark'}])
    

    Any guesses on the return value? In the case of a single document insert, the return value is the value of _id for the newly created document. In this case, it is a list of IDs.

  13. Let's execute an insert query again, this time, with a write concern provided. Execute the following write concern with w = 1 and j = True:
    >>> db.pymongoTest.insert({'name': 'Jones'}, w = 1, j = True)
    

How it works…

We instantiated the client and then got the reference to the object that will be used to access the database on which we wish to perform operations in step 3. There are a couple of ways to get this reference. The first option (db = client.test) is more convenient, unless your database name has a special character, such as a hyphen (-). For example, if the name is db-test, we would have no option other than to use the [] operator to access the database. Using either of the alternatives, we now have an object for the test database in the db variable. After we got the client and the db instance in Python, we queried to find the top 10 documents in the natural order from the collection in step 4. The syntax is exactly identical to how this query would have been executed from the shell. Step 5 simply printed out the results, 10 of them in this case. Generally, if you need instant help on a particular class using the class name or an instance of this class from the Python interpreter, simply execute dir(<class_name>) or dir(<object of a class>); which gives a listing of the attributes and functions defined in the module passed. For example, dir('pymongo.MongoClient') or dir(client), where client is the variable that holds the reference to an instance of pymongo.MongoClient, can be used to get the listing of all the supported attributes and functions. The help function is more informative and prints out the module's documentation, which is a great source of reference just in case you need instant help. Try typing in help('pymongo.MongoClient') or help(client).

In steps 4 and 5, we queried the postalCodes collection, limited the result to the top 10 results, and printed them. The returned object is of type pymongo.cursor.Cursor class. The next step got just one document from the collection using the find_one() function. This is synonymous to the findOne() method on the collection invoked from the shell. The value returned by this function is an inbuilt dict object.

In step 8, we executed another find to query the data. However, this time around, we passed two parameters to it. The first one was the query, which looked similar to how we execute from the Mongo shell. However, the type of the parameter in Python is dict. The second parameter was another object of type dict. This dictionary is used to provide the fields to be returned in the result. A value 1 for a field indicates that the value is to be selected and returned in the result. This is synonymous to select in the relational database, with a few sets of columns provided explicitly to be selected. The _id field is selected by default, unless it is explicitly set to 0 in the selector dict object. The selector provided here is {'_id':0, 'city':1, 'state':1, 'pincode':1}, which selects the city, state, and pin code and suppresses the _id field. We have the sort method too. This method has two formats: sort(sort_field, sort_direction) and sort([(sort_field, sort_direction)…(sort_field, sort_direction)]).

The first one is used when we want to sort by one field only. The second representation accepts a list of pairs of sort fields and sort directions and is used when we want to sort by multiple fields. We used the first format in the query in step 8 and the second format in our query in step 9, as we sorted first by state name and then by city.

If we look at the way we invoked sort, it was invoked on the cursor instance. Similarly, the limit function was also on the Cursor class. The evaluation is lazy and is deferred until the iteration is performed to retrieve the results from the cursor. Until that point, the cursor object is not evaluated on the server.

In step 12, we inserted a document 20 times in a collection. Each insert, as we see in the Python shell, will return a generated _id field. In terms of the syntax of insert, it is exactly identical to the operation we perform from the shell. The parameter passed for the insert operation is again an object of type dict.

In step 13, we passed a list of documents to insert in the collection. This inserts multiple documents in one call to the server; this is a bulk insert. The return value in this case is a list of IDs, one for each document inserted and in the same order as passed in the input list. However, as MongoDB doesn't support transactions, all inserts will be independent of each other, and a failure of one insert doesn't automatically roll back the entire operation.

Adding to the functionality to insert multiple documents demanded another parameter for the behavior. When one of the inserts in the given list fails, should the remaining inserts continue or should the insertion stop as soon as the first error is encountered? The name of the parameter to control this behavior is continue_on_error, and its default value is False, that is, stop as soon as the first error is encountered. If this value is True and multiple errors occur during insertion, only the latest error will be available. Hence, the default option is False, as the value is sensible. Let's take a look at a couple of examples. In the Python shell, execute the following commands:

>>> db.contOnError.drop()
>>> db.contOnError.insert([{'_id':1}, {'_id':1}, {'_id':2}, {'_id':2}])
>>> db.contOnError.count()

The count we will get is 1, which is for the first document with the _id field as 1. The moment another document with the same value of the _id field is found, 1 in this case, an error is thrown, and the bulk insert stops. Now, execute the following insert operation:

>>> db.contOnError.drop()
>>> db.contOnError.insert([{'_id':1}, {'_id':1}, {'_id':2}, {'_id':2}], continue_on_error=True)
>>> db.contOnError.count()

Here, we passed an additional parameter, continue_on_error, whose value is True. As a result of this parameter, the insert operation will continue with the next document even if an intermediate insert operation failed. The second insert with _id:1 fails; yet, the next insert goes through before another insert with _id:2 fails (as one document with this _id is already present). Also, the error reported is for the last failure, the one with _id:2 in this case.

Another parameter is check_keys, which checks for key names that start with $ and the existence of . in the key. If one is found, it will raise bson.errors.InvalidDocument. Thus, the following insert operation will fail:

>>> db.pymongoTest.insert({'a.b':1})

By default, the check will take place, unless you explicitly disable it by setting the value of this parameter to False. Thus, the following query will pass and return an object ID of the inserted document:

>>> db.pymongoTest.insert({'a.b':1}, check_keys=False)

Step 13 executed the insert operation but provided a write operation to be used for the insert to be executed.

See also

  • The Executing update and delete operations using PyMongo recipe
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset