Chapter 8. Integration with Hadoop

In this chapter, we will cover the following recipes:

  • Executing our first sample MapReduce job using the mongo-hadoop connector
  • Writing our first Hadoop MapReduce job
  • Running MapReduce jobs on Hadoop using streaming
  • Running a MapReduce job on Amazon EMR

Introduction

Hadoop is a well-known open source software to process large datasets. It also has an API for the MapReduce programming model, which is widely used. Nearly all the big data solutions have some sort of support to integrate them with Hadoop in order to use its MapReduce framework. MongoDB has a connector as well that integrates with Hadoop and lets us write MapReduce jobs using the Hadoop MapReduce API, process the data residing in the MongoDB/MongoDB dumps, and write the result to the MongoDB/MongoDB dump files. In this chapter, we will look at some recipes about the basic MongoDB and Hadoop integration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset