In this chapter, we will cover:
Writable
data typekey
typevalue
types from a mapperInputFormat
for your input data formatInputFormat
OutputFormat
sDistributedCache
This chapter introduces you to several advanced Hadoop MapReduce features that will help you to develop highly customized, efficient MapReduce applications.
In this chapter, we will explore the different data types provided by Hadoop and the steps to implement custom data types for Hadoop MapReduce computations. We will also explore the different data input and output formats provided by Hadoop. This chapter will provide you with the basic understanding of how to add support for new data formats in Hadoop. We will also be discussing other advanced Hadoop features such as using DistributedCache
for distribute data, using Hadoop Streaming for quick prototyping of Hadoop computations, and using Hadoop counters to report custom metrics for your computation as well as adding job dependencies to manage simple DAG-based workflows of Hadoop MapReduce computations.