When a user submits a job to Hadoop, this job needs to be assigned a resource (a computer/host) before execution. This process is called scheduling , and a scheduler decides when resources are assigned to a given job.
Hadoop is by default configured with a First in First out (FIFO) scheduler, which executes jobs in the same order as they arrive. However, for a deployment that is running many MapReduce jobs and shared by many users, more complex scheduling policies are needed.
The good news is that Hadoop scheduler is pluggable, and it comes with two other schedulers. Therefore, if required, it is possible to write your own scheduler as well.
This recipe describes how to change the scheduler in Hadoop.
For this recipe, you need a working Hadoop deployment. Set up Hadoop using the Setting Hadoop in a distributed cluster environment recipe from Chapter 1, Getting Hadoop Up and Running in a Cluster.
hadoop-fairscheduler-1.0.0.jar
in the HADOOP_HOME/lib
. However, from Hadoop 1.0.0 and higher releases, this JAR file is in the right place in the Hadoop distribution.HADOOP_HOME/conf/mapred-site.xml
:<property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</value> </property>
http://<job-tracker-host>:50030/scheduler
in your installation. If the scheduler has been properly applied, the page will have the heading "Fair Scheduler Administration".When you follow the preceding steps, Hadoop will load the new scheduler settings when it is started. The fair scheduler shares equal amount of resources between users unless it has been configured otherwise.
The fair scheduler supports users to configure it through two ways. There are several parameters of the mapred.fairscheduler.*
form, and we can configure these parameters via HADOOP_HOME/conf/mapred-site.xml
. Also additional parameters can be configured via HADOOP_HOME/conf/fair-scheduler.xml
. More details about fair scheduler can be found from HADOOP_HOME/docs/fair_scheduler.html
.