Executing query and insert operations with PyMongo

This recipe is all about executing basic query and insert operations using PyMongo. This is similar to what we did with the Mongo shell earlier in the book.

Getting ready

To execute simple queries, we need to have a server up and running. A simple single node is what we need. Refer to the Installing single node MongoDB recipe from Chapter 1, Installing and Starting the Server for instructions on how to start the server. The data that we will be operating on needs to be imported in the database. The steps to import the data are given in the Creating test data recipe from Chapter 2, Command-line Operations and Indexes. Python 2.7, or higher, has to be present on the host operating system along with MongoDB's Python client, PyMongo. Look at the earlier recipe, Connecting to a single node using a Python client, in Chapter 1, Installing and Starting the Server on how to install PyMongo for your host operating system. Additionally, in this recipe, we will execute insert operations and provide a write concern to use.

How to do it…

Let's start with querying for Mongo in the Python shell. This will be identical to what we do in the mongo shell except that this is in the Python programming language, as opposed to the JavaScript that we have in the mongo shell. We can use the basics that we will see here to write big production systems that run on Python and use mongo as a data store.

Let's begin by starting the Python shell from the operating system's command prompt. All these steps are independent of the host operating system. Perform the following steps:

  1. Type the following in the shell and the Python shell should start:
    $ python
    Python 2.7.6 (default, Mar 22 2014, 22:59:56)
    [GCC 4.8.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
    
  2. Then, import the pymongo package and create the client as follows:
    >>> import pymongo
    >>> client = pymongo.MongoClient('localhost', 27017)
    The following is an alternative way to connect
    >>> client = pymongo.MongoClient('mongodb://localhost:27017')
    
  3. This works well and achieves the same result. Now that we have the client, our next step is to get the database that we will be performing the operations on. This is unlike some of the programming languages where we have a getDatabase() method to get an instance of the database. We will get a reference to the database object that we will be performing the operations on, test in this case. We will do this in the following way:
    >>> db = client.test
    Another alternative is 
    >>> db = client['test']
    
  4. We will query the postalCodes collection. We will limit our results to 10 items.
    >>> postCodes = db.postalCodes.find().limit(10)
    
  5. Iterate over the results. Watch out for the indentation of the print after the for statement. The following fragment should print 10 documents as returned:
    >>> for postCode in postCodes:
      print 'City: ', postCode['city'], ', State: ', postCode['state'], ', Pin Code: ', postCode['pincode']
    
  6. To find one document, execute the following:
    >>> postCode = db.postalCodes.find_one()
    
  7. Print the state and city of the returned result as follows:
    >>> print 'City: ', postCode['city'], ', State: ', postCode['state'], ', Pin Code: ', postCode['pincode']
    
  8. Let's query top 10 cities in the state of Gujarat sorted by the name of the city and, additionally, we just select the city, state, and pincode. Execute the following query in the Python shell:
    >>> cursor = db.postalCodes.find({'state':'Gujarat'}, {'_id':0, 'city':1, 'state':1, 'pincode':1}).sort('city', pymongo.ASCENDING).limit(10)
    

    The preceding cursor's results can be printed in the same way that we printed the results in step 5.

  9. Let's sort the data that we query. We want to sort in a descending order of state and then by ascending order of the city. We will write the query as follows:
    >>> city = db.postalCodes.find().sort([('state', pymongo.DESCENDING),('city',pymongo.ASCENDING)]).limit(5)
    
  10. Iterating through this cursor should print out five results to the console. Refer to step 5 on how we iterate over a cursor returned to print the results.
  11. So, we have played a bit to find documents and covered the basic operations in Python as far as the querying of MongoDB is concerned. Now, let's see a bit about the insert operation. We will use a test collection to perform these operations and not disturb our postal codes test data. We will use a pymongoTest collection for this purpose and add documents in a loop to it as follows:
    >>> for i in range(1, 21):
      db.pymongoTest.insert_one({'i':i})
    
  12. The insert can take a list of dictionary objects and perform a bulk insert. So now, something similar to the following insert is perfectly valid:
    >>> db.pythonTest.insert_many([{'name':'John'}, {'name':'Mark'}])
    

    Any guesses on the return value? In case of a single document insert, the return value is the value of _id for the newly created document. In this case, it is a list of IDs.

How it works…

In step 2, we instantiate the client and get the reference to the MongoClient object that will be used to access the database. There are a couple of ways to get this reference. The first option is more convenient, unless your database name has some special character, such as a hyphen (-). For example, if the name is db-test, we would have no option other than to use the [] operator to access the database. Using either of the alternatives, we now have an object for the test database in the db variable. After we get the client and db instances in Python, we query to find the top 10 documents in the natural order from the collection in step 3. The syntax is identical to how this query would have been executed in the shell. Step 4 simply prints out the results, 10 of them in this case. Generally, if you need instant help on a particular class using the class name or an instance of this class from the Python interpreter, simply perform dir(<class_name>) or dir(<object of a class>), which gives you a list of attributes and functions defined in the module passed. For example, dir('pymongo.MongoClient') or dir(client), where the client is the variable holding reference to an instance of pymongo.MongoClient, can be used to get the list of all the supported attributes and functions. The help function is more informative, prints out the module's documentation, and is a great source of reference just in case you need instant help. Try typing help('pymongo.MongoClient') or help(client).

In steps 3 and 4, we query the postalCodes collection, limit the result to the top 10 results, and print them. The returned object is of a type pymongo.cursor.Cursor class. The next step gets just one document from the collection using the find_one() function. This is synonymous to the findOne() method on the collection invoked in the shell. The value returned by this function is an inbuilt object, dict.

In step 6, we execute another find to query the data. In step 8, we pass two Python dicts. The first dict is the query, similar to the query parameter we use in mongo shell. The second dictionary is used to provide the fields to be returned in the result. A value, one, for a field indicates that the value is to be selected and returned in the result. This is synonymous with the select statement in a relational database with a few sets of columns provided explicitly to be selected. The _id field is selected by default unless it is explicitly set to zero in the selector dict object. The selector provided here is {'_id':0, 'city':1, 'state':1, 'pincode':1}, which selects the city, state, and pincode and suppresses the _id field. We have a sort method as well. This method has two formats as follows:

sort(sort_field, sort_direction)
sort([(sort_field, sort_direction)…(sort_field, sort_direction)])

The first one is used when we want to sort by one field only. The second representation accepts a list of pairs of the sort field and sort directions and is used when we want to sort by multiple fields. We used the first form in the query in step 8 and the second format in our query in step 9 as we sort first by the state name and then, by city.

If we look at the way we invoke sort, it is invoked on the Cursor instance. Similarly, the limit function is also on the Cursor class. The evaluation is lazy and deferred until the iteration is performed in order to retrieve the results from the cursor. Until this point of time, the Cursor object is not evaluated on the server.

In step 11, we insert a document 20 times in a collection. Each insert, as we can see in the Python shell, will return a generated _id field. In terms of the syntax of insert, it is exactly identical to the operation that we perform in the shell. The parameter passed for the insert is an object of type dict.

In step 12, we pass a list of documents to insert in the collection. This is referred to as a bulk insert operation, which inserts multiple documents in a single call to the server. The return value in this case is a list of IDs, one for each document inserted, and the order is the same as those passed in the input list. However, as MongoDB doesn't support transactions, each insert will be independent of each other, and a failure of one insert doesn't roll back the entire operation automatically.

Adding the functionality of inserting multiple documents demanded another parameter for the behavior. When one of the inserts in the list given fails, should the remaining inserts continue or the insertion stop as soon as the first error is encountered? The name of the parameter to control this behavior is continue_on_error and its default value is False, that is, stop as soon as the first error is encountered. If this value is True and multiple errors occur during insertion, only the latest error will be available, and hence the default option with False as the value is sensible. Let's look at a couple of examples. In the Python shell, execute the following:

>>> db.contOnError.drop()
>>> db.contOnError.insert([{'_id':1}, {'_id':1}, {'_id':2}, {'_id':2}])
>>> db.contOnError.count()

The count that we will get is 1, which is for the first document with the _id field as 1. The moment another document with the same value of the _id field is found, 1 in this case, an error is thrown and the bulk insert stops. Now execute the following insert operation:

>>> db.contOnError.drop()
>>> db.contOnError.insert([{'_id':1}, {'_id':1}, {'_id':2}, {'_id':2}], continue_on_error=True)
>>> db.contOnError.count()

Here, we have passed an additional parameter, continue_on_error, whose value is True. What this does is ensures that the insert operation will continue with the next document even if an intermediate insert operation fails. The second insert with _id:1 fails, yet the next insert goes through before another insert with _id:2 fails (as one document with this _id is already present). Additionally, the error reported is for the last failure, the one with _id:2.

See also

The next recipe, Executing update and delete operations using PyMongo, picks up where this leaves off and introduces the update, remove, and atomic find operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset