We already saw PyMongo using Python's client interface for MongoDB in the Executing query and insert operations using PyMongo and Executing update and delete operations using PyMongo recipes. In this recipe, we will use the postal code collection and run an aggregation example using PyMongo. The intention of this recipe is not to explain aggregation but to show how aggregation can be implemented using PyMongo. In this recipe, we will aggregate the data based on the state names and get the top five state names by the number of documents they appear in. We will make use of the $project
, $group
, $sort
, and $limit
operators for the process.
To execute the aggregation operations, we need to have a server up and running. A simple single node is what we will need. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, to learn how to start the server. The data on which we will operate needs to be imported in the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. Python and PyMongo are expected to be installed. Look at the Installing PyMongo recipe to know how to install PyMongo for your host operating system. Since this is a way to implement aggregation in Python, that the reader is expected to be aware of the aggregation framework on MongoDB.
Let's take a look at the steps in detail:
$ python
>>> import pymongo
MongoClient
as follows:>>> client = pymongo.MongoClient('mongodb://localhost:27017')
test
database's object as follows:>>> db = client.test
postalCodes
collection as follows:result = db.postalCodes.aggregate( [ {'$project':{'state':1, '_id':0}}, {'$group':{'_id':'$state', 'count':{'$sum':1}}}, {'$sort':{'count':-1}}, {'$limit':5} ] )
>>> result['result']
The steps are pretty straightforward. We connected to the database that runs on the localhost and created a database object. The aggregation operation we invoked on the collection using the aggregate
function is very similar to how we will invoke aggregation from the shell. The object in the return value, result
, is an object of type dict
; it has two keys of interest. One of the keys is called ok
, whose value will be 1
if the aggregation operation executed successfully. The other key is called result
and its type is a list. In our case, it will contain five documents that contain the name of the state and the count of the number of their occurrences.