In this chapter, we will cover:
There is an adage among those working with Hadoop that everything breaks at scale. Malformed or unexpected input is common. It's an unfortunate downside of working with large amounts of unstructured data. Within the context of Hadoop, individual tasks are isolated and given different sets of input. This allows Hadoop to easily distribute jobs, but leads to difficulty in tracking global events and understanding the state of each individual task. Fortunately, there are several tools and techniques available to aid in the process of debugging Hadoop jobs. This chapter will focus on applying these tools and techniques to debug MapReduce jobs.