Chapter 4. Developing Complex Hadoop MapReduce Applications

In this chapter, we will cover:

  • Choosing appropriate Hadoop data types
  • Implementing a custom Hadoop Writable data type
  • Implementing a custom Hadoop key type
  • Emitting data of different value types from a mapper
  • Choosing a suitable Hadoop InputFormat for your input data format
  • Adding support for new input data formats – implementing a custom InputFormat
  • Formatting the results of MapReduce computations – using Hadoop OutputFormat s
  • Hadoop intermediate (map to reduce) data partitioning
  • Broadcasting and distributing shared resources to tasks in a MapReduce job : Hadoop DistributedCache
  • Using Hadoop with legacy applications – Hadoop Streaming
  • Adding dependencies between MapReduce jobs
  • Hadoop counters for reporting custom metrics

Introduction

This chapter introduces you to several advanced Hadoop MapReduce features that will help you to develop highly customized, efficient MapReduce applications.

In this chapter, we will explore the different data types provided by Hadoop and the steps to implement custom data types for Hadoop MapReduce computations. We will also explore the different data input and output formats provided by Hadoop. This chapter will provide you with the basic understanding of how to add support for new data formats in Hadoop. We will also be discussing other advanced Hadoop features such as using DistributedCache for distribute data, using Hadoop Streaming for quick prototyping of Hadoop computations, and using Hadoop counters to report custom metrics for your computation as well as adding job dependencies to manage simple DAG-based workflows of Hadoop MapReduce computations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset