Hadoop by default runs without security. However, it also supports Kerberos-based setup, which provides full security. This recipe describes how to configure Hadoop with Kerberos for security.
Kerberos setups will include a Hadoop cluster—NameNode, DataNodes, JobTracker, and TaskTrackers—and a Kerberos server. We will define users as principals in the Kerberos server. Users can obtain a ticket from the Kerberos server, and use that ticket to log in to any server in Hadoop. We will map each Kerberos principal with a Unix user. Once logged in, the authorization is performed based on the Unix user and group permissions associated with each user.
Set up Hadoop by following Chapter 1, Getting Hadoop Up and Running in a Cluster either using pseudo-distributed or clustered setup.
We need a machine to use as the Kerberos node for which you have root access. Furthermore, the machine should have the domain name already configured (we will assume DNS name is hadoop.kbrelam.com
, but you can replace it with another domain). If you want to try this out in a single machine only, you can set up the DNS name through adding your IP address hadoop.kbrelam.com
to your /etc/hosts
file.
Provide hadoop.kbrelam.com
as the realm and the administrative server when installation asks for it. Then run the following command to create a realm:
>sudo krb5_newrealm
>kadmin.local >kadmin.local: add principal srinath/admin
/etc/krb5kdc/kadm5.acl
to include the line srinath/[email protected] *
to grant all the permissions.>sudo /etc/init.d/krb5-admin-server restart.
>kinitsrinath/admin >klist
We will have three users—hdfs
to run HDFS server, mapred
to run MapReduce server, and bob
to submit jobs.
>groupaddhadoop >useraddhdfs >useraddmapred >usermod -g hadoophdfs >usermod -g hadoopmapred >useradd -G mapred bob >usermod -a -G hadoop bob
>kadmin.local >kadmin.local: addprinc -randkey hdfs/hadoop.kbrelam.com >kadmin.local: addprinc –randkey mapred/hadoop.kbrelam.com >kadmin.local: addprinc -randkey host/hadoop.kbrelam.com >kadmin.local: addprinc -randkey bob/hadoop.kbrelam.com
>kadmin: xst -norandkey -k hdfs.keytab hdfs/hadoop.kbrelam.com host/hadoop.kbrelam.com >kadmin: xst -norandkey -k mapred.keytab mapred/hadoop.kbrelam.com host/hadoop.kbrelam.com >kadmin.local: xst -norandkey -k bob.keytab bob/hadoop.kbrelam.com >kadmin.local: exit
HADOOP_HOME/conf
directory. Change the directory to HADOOP_HOME
and run following commands to set the permissions for key tab files:>chownhdfs:hadoopconf/hdfs.keytab >chownmapred:hadoopconf/mapred.keytab
HADOOP_HOME
and run the following commands:>chownhdfs:hadoop /opt/hadoop-work/name/ >chownhdfs:hadoop /opt/hadoop-work/data >chownmapred:hadoop /opt/hadoop-work/local/ >bin/hadoopfs -chownhdfs:hadoop / >bin/hadoopfs -chmod 755 / >bin/hadoopfs -mkdir /mapred >bin/hadoopfs -mkdir /mapred/system/ >bin/hadoopfs -chownmapred:hadoop /mapred/system >bin/hadoopfs -chmod -R 700 /mapred/system >bin/hadoopfs -chmod 777 /tmp
JAVA_HOME/jre/lib/security
.HADOOP_HOME
value with the corresponding location. Here, Hadoop will replace the _HOST
with the localhost name. The following code snippet adds properties to core-site.xml
:<property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> </property>
resources/chapter3/kerberos-hdfs-site.xml
of the source code for this chapter to the HADOOP_HOME/conf/hdfs-site.xml
. Replace the HADOOP_HOME
value with the corresponding location. Here Hadoop will replace the _HOST
with the localhost name.HADOOP_HOME
:>sudo -u hdfs bin/hadoopnamenode &
>kinit hdfs/hadoop.kbrelam.com -k -t conf/hdfs.keytab >klist >kinit –R
In the first command, we specify the name of the principal (for example, hdfs/hadoop.kbrelam.com
) to apply operations to that principal. The first two commands are theoretically sufficient. However, there is a bug that stops Hadoop from reading the credentials. We can work around this by the last command that rewrites the key in more readable format. Now let's run hdfs
commands.
>bin/hadoopfs -ls /
>su - root >cd /opt/hadoop-1.0.3/ >export HADOOP_SECURE_DN_USER=hdfs >export HADOOP_DATANODE_USER=hdfs >bin/hadoopdatanode & >exit
conf/map-red.xml
. Replace HADOOP_HOME
with the corresponding location.<property> <name>mapreduce.jobtracker.kerberos.principal</name> <value>mapred/[email protected]</value> </property> <property> <name>mapreduce.jobtracker.kerberos.https.principal</name><value>host/[email protected]</value> </property> <property> <name>mapreduce.jobtracker.keytab.file</name> <value>HADOOP_HOME/conf/mapred.keytab</value><!-- path to the MapReducekeytab --> </property><!-- TaskTracker security configs --> <property> <name>mapreduce.tasktracker.kerberos.principal</name> <value>mapred/[email protected]</value> </property> <property> <name>mapreduce.tasktracker.kerberos.https.principal</name> <value>host/[email protected]</value> </property> <property> <name>mapreduce.tasktracker.keytab.file</name> <value>HADOOP_HOME/conf/mapred.keytab</value><!-- path to the MapReducekeytab --> </property><!-- TaskController settings --> <property> <name>mapred.task.tracker.task-controller</name><value>org.apache.hadoop.mapred.LinuxTaskController</value> </property> <property> <name>mapreduce.tasktracker.group</name> <value>mapred</value> </property>
>mkdir /etc/hadoop >cpconf/taskcontroller.cfg /etc/hadoop/taskcontroller.cfg >chmod 755 /etc/hadoop/taskcontroller.cfg
/etc/hadoop/taskcontroller.cfg
:mapred.local.dir=/opt/hadoop-work/local/ hadoop.log.dir=HADOOP_HOME/logs mapreduce.tasktracker.group=mapred banned.users=mapred,hdfs,bin min.user.id=1000
Set up the permissions by running the following command from HADOOP_HOME
, and verify that the final permissions of bin/task-controller
are rwsr-x---
. Otherwise, the jobs will fail to execute.
>chmod 4750 bin/task-controller >ls -l bin/task-controller >-rwsr-x--- 1 root mapred 63374 May 9 02:05 bin/task-controller
>sudo -u mapred bin/hadoopjobtracker
Wait for the JobTracker to start up and then run the following command:
>sudo -u mapred bin/hadooptasktracker
HADOOP_HOME
. If all commands run successfully, you will see the WordCount output as described in Chapter 1, Getting Hadoop Up and Running in a Cluster.>su bob >kinit bob/hadoop.kbrelam.com -k -t conf/bob.keytab >kinit –R >bin/hadoopfs -mkdir /data >bin/hadoopfs -mkdir /data/job1 >bin/hadoopfs -mkdir /data/job1/input >bin/hadoopfs -put README.txt /data/job1/input >bin/hadoop jar hadoop-examples-1.0.3.jar wordcount /data/job1 /data/output
By running the kinit
command, the client would obtain a Kerberos ticket and store it in the filesystem. When we run the command, the client uses the Kerberos ticket to get access to the Hadoop nodes and submit jobs. Hadoop resolves the permission based on the user and group permissions of the Linux users that matches the Kerberos principal.
Hadoop Kerberos security settings have many pitfalls. The two tools that might be useful are as follows:
HADOOP_OPTS="$HADOOP_CLIENT_OPTS -Dsun.security.krb5.debug=true"
https://ccp.cloudera.com/display/CDHDOC/Appendix+E+-+Task-controller+Error+Codes
Also, when you change something, make sure you restart all the processes first by killing all the running processes.