Aggregation in Mongo using PyMongo

We already saw PyMongo using Python's client interface for MongoDB in the Executing query and insert operations using PyMongo and Executing update and delete operations using PyMongo recipes. In this recipe, we will use the postal code collection and run an aggregation example using PyMongo. The intention of this recipe is not to explain aggregation but to show how aggregation can be implemented using PyMongo. In this recipe, we will aggregate the data based on the state names and get the top five state names by the number of documents they appear in. We will make use of the $project, $group, $sort, and $limit operators for the process.

Getting ready

To execute the aggregation operations, we need to have a server up and running. A simple single node is what we will need. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, to learn how to start the server. The data on which we will operate needs to be imported in the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. Python and PyMongo are expected to be installed. Look at the Installing PyMongo recipe to know how to install PyMongo for your host operating system. Since this is a way to implement aggregation in Python, that the reader is expected to be aware of the aggregation framework on MongoDB.

How to do it…

Let's take a look at the steps in detail:

  1. Open the Python terminal by typing the following command:
    $ python
    
  2. Once the Python shell opens, import PyMongo as follows:
    >>> import pymongo
    
  3. Create an instance of MongoClient as follows:
    >>> client = pymongo.MongoClient('mongodb://localhost:27017')
    
  4. Get the test database's object as follows:
    >>> db = client.test
    
  5. Now, we will execute the aggregation operation on the postalCodes collection as follows:
    result = db.postalCodes.aggregate(
      [
        {'$project':{'state':1, '_id':0}},
        {'$group':{'_id':'$state', 'count':{'$sum':1}}},
        {'$sort':{'count':-1}},
        {'$limit':5}
      ]
    )
    
  6. Type the following command to view the results:
    >>> result['result']
    

How it works…

The steps are pretty straightforward. We connected to the database that runs on the localhost and created a database object. The aggregation operation we invoked on the collection using the aggregate function is very similar to how we will invoke aggregation from the shell. The object in the return value, result, is an object of type dict; it has two keys of interest. One of the keys is called ok, whose value will be 1 if the aggregation operation executed successfully. The other key is called result and its type is a list. In our case, it will contain five documents that contain the name of the state and the count of the number of their occurrences.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset