Executing MapReduce in Mongo using a Java client

In our previous recipe, Implementing aggregation in Mongo using a Java client, we saw how to execute aggregation operations in Mongo using the Java client. In this recipe, we will work on the same use case as we did for the aggregation operation but We will use MapReduce. The intent is to aggregate the data based on the state names and get the top five state names by the number of documents that they appear in.

If somebody is not aware of how to write MapReduce code for Mongo from a programming language client and is seeing it for the first time, you might be surprised to see how it is actually done. You might have imagined that you would be writing the map and reduce function in the programming language that you are writing the code in, Java in this case, and then using it to execute the map reduce. However, we need to bear in mind that the MapReduce jobs run on the mongo servers and they execute JavaScript functions. Hence, irrespective of the programming language driver, the map reduce functions are written in JavaScript. The programming language drivers just act as a means of letting us invoke and execute the map reduce functions (written in JavaScript) on the server.

Getting ready

The test class used for this recipe is com.packtpub.mongo.cookbook.MongoMapReduceTest. To execute the map reduce operations, we need to have a server up and running. A simple single node is what we need. Refer to the Installing single node MongoDB recipe from Chapter 1, Installing and Starting the Server for instructions on how to start the server. The data that we will operate on needs to be imported in the database. The steps to import the data are given in the Creating test data recipe in Chapter 2, Command-line Operations and Indexes. The next step is to download the Java project, mongo-cookbook-javadriver, from the Packt website. Though Maven can be used to execute the test case, it is convenient to import the project in an IDE and execute the test case class. It is assumed that you are familiar with the Java programming language and comfortable using the IDE that the project will be imported to.

How to do it…

To execute the test case, one can either import the project in an IDE-like Eclipse and execute the test case or execute the test case from the command prompt using Maven.

  1. If you are using an IDE, open the test class and execute it as a JUnit test case.
  2. If you are planning to use Maven to execute this test case, go to the command prompt, change the directory at the root of the project, and execute the following to execute this single test case:
    $ mvn -Dtest=com.packtpub.mongo.cookbook.MongoMapReduceTesttest
    

Everything should get executed fine if the Java SDK and Maven are properly set up and the MongoDB server is up and running and listening to port 27017 for the incoming connections.

How it works…

The test case method for our map reduce test is mapReduceTest().

Map reduce operations can be done in Mongo from a Java client using the mapReduce() method defined in the DBCollection class. There are a lot of overloaded versions, and you can refer to the Javadocs of the com.mongodb.DBCollection class for more details on the various flavors of this method. The one that we used is collection.mapReduce(mapper, reducer, output collection, query).

The method accepts the following four parameters:

  • The mapper function is of type String and a JavaScript code that would be executed on the mongo database server
  • The reducer function is of type String and a JavaScript code that would be executed on the mongo database server
  • The name of the collection that the output of the map reduce execution will be written to
  • The query that will be executed by the server and the result of this query will be the input to the map reduce job execution

As the assumption is that the reader is well-versed with the map reduce operations in the shell, we won't explain the map reduce JavaScript functions that we used in the test case method. All it does is emit keys as the names of the states and values, which are the number of times the particular state name occurs. This result is added to the output collection, javaMROutput, in this case. For example, in the entire collection, the state Maharashtra appears 6446 times; thus, the document for the state of Maharashtra is {'_id': 'Maharashtra', 'value': 6446}. To confirm that this is the true value or not, you can execute the following query in the mongo shell and see that the result is indeed 6446:

> db.postalCodes.count({state:'Maharashtra'})
6446

We are still not done as the requirement is to find the top five states by their occurrence in the collection; we still have just the states and their occurrences, so the final step is to sort the documents by the value field, which is the number of times the state's name occurs in descending order, and limit the result to five documents.

See also

Refer to Chapter 8, Integration with Hadoop for different recipes on executing Map Reduce jobs in MongoDB using the Hadoop connector. This allows us to write the Map and Reduce functions in languages such as Java, Python, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset