Along with maintaining counters, another role of the Reporter
class in Hadoop is to capture task status information. The task status information is periodically sent to the Job Tracker. The
Job Tracker UI is updated to reflect the current status. By default, the task status will display its state. The task state can be one of the following:
RUNNING
SUCCEEDED
FAILED
UNASSIGNED
KILLED
COMMIT_PENDING
FAILED_UNCLEAN
KILLED_UNCLEAN
When debugging a MapReduce job, it can be useful to display a custom message that gives more detailed information on how the task is running. This recipe shows how to update the task status.
Updating a task's status message can be done using the
setStatus()
method of the job's Context
class.
context.setMessage("user custom message");
The source code for this chapter provides an example of using a custom task status message to display the number of rows being processed per second by the task.
public static class StatusMap extends Mapper<LongWritable, Text, LongWritable, Text> { private int rowCount = 0; private long startTime = 0; public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ //Display rows per second every 100,000 rows rowCount++; if(startTime == 0 || rowCount % 100000 == 0) { if(startTime > 0) { long estimatedTime = System.nanoTime() - startTime; context.setStatus("Processing: " + (double)rowCount / ((double)estimatedTime/1000000000.0) + " rows/second"); rowCount = 0; } startTime = System.nanoTime(); } context.write(key, value); } }
Two private class variables are declared: rowCount
for keeping track of the number of rows that are processed and startTime
for keeping track of the time when processing started. Once the map function has processed 100,000 lines, the task status is updated with the number of rows per second that are being processed.
context.setStatus("Processing: " + (double)rowCount / ((double)estimatedTime/1000000000.0) + " rows/second");
After the message has been updated, the
rowCount
and
startTime
variables are reset and the process starts over again. The status is stored locally in the memory of the current process. It is then sent to the Task Tracker. The next time the
Task Tracker pings, the Job Tracker is also sent the updated status message. Once the Job Tracker receives the status message, this information is made available to the UI.