Aggregation in Mongo using a Java client

The intention of this recipe is not to explain aggregation but to show how aggregation can be implemented using a Java client from a Java program. In this recipe, we will aggregate the data based on the state names and get the top five state names by the number of documents they appear in. We will make use of the $project, $group, $sort, and $limit operators for the process.

Getting ready

The test class used for this recipe is com.packtpub.mongo.cookbook.MongoAggregationTest. To execute the aggregation operations, we need to have a server up and running. A simple single node is what we will need. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, to learn how to start the server. The data on which we will operate needs to be imported in the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. The next step is to download the mongo-cookbook-javadriver Java project from the book's website. Though Maven can be used to execute the test case, it is convenient to import the project in an IDE and execute the test case class. It is assumed that you are familiar with the Java programing language and comfortable using the IDE into which the project will be imported.

How to do it…

To execute the test case, one can either import the project in an IDE such as Eclipse and execute the test case or execute the test case from the command prompt using Maven.

If you are using an IDE, open the test class and execute it as a JUnit test case. If you plan to use Maven to execute this test case, go to the command prompt, change the directory to the root of the project, and execute the following command to execute this single test case:

$ mvn -Dtest=com.packtpub.mongo.cookbook.MongoAggregationTest test

Everything should execute fine if the Java SDK and Maven are properly set up and the MongoDB server is up and running and listening to port 27017 for incoming connections.

How it works…

The method used to look at aggregation functionality is aggregationTest() in our test class. The aggregation operation is performed on MongoDB from a Java client using the aggregate() method defined in the DBCollection class. The method has the following signature:

AggregationOutput aggregate(firstOp, additionalOps)

Only the first argument is mandatory; this forms the first operation in the pipeline. The second argument is a varagrs argument (a variable number of arguments with zero or more values) that allows more pipeline operators. All these arguments are of type com.mongodb.DBObject. If any exception occurs during the execution of the aggregation command, the aggregation operation will throw com.mongodb.MongoException with the cause of the exception.

The return type com.mongodb.AggregationOutput is used to get the result of the aggregation operation. From a developer's perspective, we are more interested in the results field of this instance, which can be accessed using the results() method of the returned object. The results() method returns an object of type Iterable<DBObject>, which one can iterate to get the results of the aggregation.

Let's look at how we implemented the aggregation pipeline in our test class:

AggregationOutput output = collection.aggregate(
  //{'$project':{'state':1, '_id':0}},
  new BasicDBObject("$project", new BasicDBObject("state", 1).append("_id", 0)),
  //{'$group':{'_id':'$state', 'count':{'$sum':1}}}
  new BasicDBObject("$group", new BasicDBObject("_id", "$state")
    .append("count", new BasicDBObject("$sum", 1))),
  //{'$sort':{'count':-1}}
  new BasicDBObject("$sort", new BasicDBObject("count", -1)),
  //{'$limit':5}
  new BasicDBObject("$limit", 5)
);

There are four operations in the pipeline in the following order. A $project operation, followed by $group, $sort, and then $limit.

The last two operations look inefficient; using them, we will sort everything but then just take the top five elements. The MongoDB server in such scenarios is intelligent enough to consider the limit operation while sorting; as a result of this, only the top five results need to be maintained rather than sorting all the results.

For Version 2.6 of MongoDB, the aggregation result can return a cursor. Though the preceding code snippet is still valid, the AggregationResult object is no longer the only way to get the results of the operation, but we can use com.mongodb.Cursor to iterate the results. Also, the preceding format is now deprecated in favor of the format that accepts a list of pipeline operators rather than varargs for the operators to be used. Refer to the Java docs of the com.mongodb.DBCollection class and look for various overloaded aggregate() methods.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset