Implementing aggregation in Mongo using a Java client

The intention of this recipe is not to explain aggregation but to show you how aggregation can be implemented using the Java client from a Java program. In this recipe, we will aggregate the data based on the state names and get the top five state names by the number of documents that they appear in. We will use the $project, $group, $sort, and $limit operators for the process.

Getting ready

The test class used for this recipe is com.packtpub.mongo.cookbook.MongoAggregationTest. To execute the aggregation operations, we need to have a server up and running. A simple single node is what we need. Refer to the Installing single node MongoDB recipe from Chapter 1, Installing and Starting the Server for instructions on how to start the server. The data that we will operate on needs to be imported in the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. The next step is to download the Java project, mongo-cookbook-javadriver, from the Packt website. Though Maven can be used to execute the test case, it is convenient to import the project in an IDE and execute the test case class. It is assumed that you are familiar with the Java programming language and comfortable using the IDE that the project will be imported to.

How to do it…

To execute the test case, one can either import the project in an IDE-like Eclipse and execute the test case or execute the test case from the command prompt using Maven.

  1. If you are using an IDE, open the test class and execute it as a JUnit test case.
  2. If you are planning to use Maven to execute this test case, go to the command prompt, change the directory at the root of the project, and execute the following to execute this single test case:
    $ mvn -Dtest=com.packtpub.mongo.cookbook.MongoAggregationTesttest
    

Everything should get executed fine if the Java SDK and Maven are properly set up and the MongoDB server is up and running and listening to port 27017 for the incoming connections.

How it works…

The method used for the aggregation functionality is aggregationTest() in our test class. The aggregation operation is performed on MongoDB from a Java client using the aggregate() method defined in the DBCollection class. The method has the following signature:

AggregationOutput aggregate(firstOp, additionalOps)

Only the first argument is mandatory, which forms the first operation in the pipeline. The second argument is a varagrs argument (variable number of arguments with zero or more values), which allows more pipeline operators. All these arguments are of the com.mongodb.DBObject type. In case any exception occurs in the execution of the aggregation command, the aggregation operation will throw com.mongodb.MongoException with the cause of the exception.

The return type, com.mongodb.AggregationOutput, is used to get the result of the aggregation operation. From a developer's perspective, we are more interested in the results field of this instance, which can be accessed using the results() method of the returned object. The results() method returns an object of type, Iterable<DBObject>, which one can iterate to get the results of the aggregation.

Let's look at how we implemented the aggregation pipeline in our test class:

AggregationOutput output = collection.aggregate(
    //{'$project':{'state':1, '_id':0}},
    new BasicDBObject("$project", new BasicDBObject("state", 1).append("_id", 0)),
    //{'$group':{'_id':'$state', 'count':{'$sum':1}}}
    new BasicDBObject("$group", new BasicDBObject("_id", "$state")
      .append("count", new BasicDBObject("$sum", 1))),
    //{'$sort':{'count':-1}}
    new BasicDBObject("$sort", new BasicDBObject("count", -1)),
    //{'$limit':5}
    new BasicDBObject("$limit", 5)
);

There are four steps in the pipeline in the following order: a $project operation, followed by $group, $sort, and then $limit.

The last two operations look inefficient where we sort all and then just take the top five elements. In such scenarios, the MongoDB server is intelligent enough to consider the limit operation while sorting, where only the top five results need to be maintained rather than sorting all the results.

For version 2.6 of MongoDB, the aggregation result can return a cursor. Though the preceding code is still valid, the AggregationResult object is no longer the only way to get the results of the operation. We can use com.mongodb.Cursor that can be used to iterate the results. Additionally, the preceding format is now deprecated in favor of the format that accepts a list of pipeline operators rather than varargs for the operators. Refer to the Javadocs of the com.mongodb.DBCollection class and look at the various overloaded aggregate() methods.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset