Hadoop security – integrating with Kerberos

Hadoop by default runs without security. However, it also supports Kerberos-based setup, which provides full security. This recipe describes how to configure Hadoop with Kerberos for security.

Hadoop security – integrating with Kerberos

Kerberos setups will include a Hadoop cluster—NameNode, DataNodes, JobTracker, and TaskTrackers—and a Kerberos server. We will define users as principals in the Kerberos server. Users can obtain a ticket from the Kerberos server, and use that ticket to log in to any server in Hadoop. We will map each Kerberos principal with a Unix user. Once logged in, the authorization is performed based on the Unix user and group permissions associated with each user.

Getting ready

Set up Hadoop by following Chapter 1, Getting Hadoop Up and Running in a Cluster either using pseudo-distributed or clustered setup.

We need a machine to use as the Kerberos node for which you have root access. Furthermore, the machine should have the domain name already configured (we will assume DNS name is hadoop.kbrelam.com, but you can replace it with another domain). If you want to try this out in a single machine only, you can set up the DNS name through adding your IP address hadoop.kbrelam.com to your /etc/hosts file.

How to do it...

  1. Install Kerberos on your machine. Refer to http://web.mit.edu/Kerberos/krb5-1.8/krb5-1.8.6/doc/krb5-install.html for further instructions on setting up Kerberos.

    Provide hadoop.kbrelam.com as the realm and the administrative server when installation asks for it. Then run the following command to create a realm:

    >sudo krb5_newrealm
    
  2. In Kerberos, we call users "principals". Create a new principal by running following commands:
    >kadmin.local
    >kadmin.local: add principal srinath/admin
    
  3. Edit /etc/krb5kdc/kadm5.acl to include the line srinath/[email protected] * to grant all the permissions.
  4. Restart the Kerberos server by running the following command:
    >sudo /etc/init.d/krb5-admin-server restart.
    
  5. You can test the new principal by running following commands:
    >kinitsrinath/admin
    >klist
    
  6. Kerberos will use Unix users in Hadoop machines as Kerberos principals and use local Unix-level user permissions to do authorization. Create the following users and groups with permissions in all the machines on which you plan to run MapReduce.

    We will have three users—hdfs to run HDFS server, mapred to run MapReduce server, and bob to submit jobs.

    >groupaddhadoop
    >useraddhdfs
    >useraddmapred
    >usermod -g hadoophdfs
    >usermod -g hadoopmapred
    >useradd -G mapred bob
    >usermod -a -G hadoop bob
    
  7. Now let us create Kerberos principals for these users.
    >kadmin.local
    >kadmin.local:  addprinc -randkey
    hdfs/hadoop.kbrelam.com
    >kadmin.local:  addprinc –randkey
    mapred/hadoop.kbrelam.com
    >kadmin.local:  addprinc -randkey
    host/hadoop.kbrelam.com
    >kadmin.local:  addprinc -randkey
    bob/hadoop.kbrelam.com
    
  8. Now, we will create a key tab file that contains credentials for Kerberos principals. We will use these credentials to avoid entering the passwords at Hadoop startup.
    >kadmin: xst -norandkey -k hdfs.keytab hdfs/hadoop.kbrelam.com host/hadoop.kbrelam.com
    >kadmin:  xst -norandkey -k mapred.keytab mapred/hadoop.kbrelam.com host/hadoop.kbrelam.com
    >kadmin.local:  xst -norandkey -k bob.keytab bob/hadoop.kbrelam.com
    >kadmin.local:  exit
    
  9. Deploy key tab files by moving them in to the HADOOP_HOME/conf directory. Change the directory to HADOOP_HOME and run following commands to set the permissions for key tab files:
    >chownhdfs:hadoopconf/hdfs.keytab
    >chownmapred:hadoopconf/mapred.keytab
    
  10. Now, set permissions in the filesystem and Hadoop. Change the directory to HADOOP_HOME and run the following commands:
    >chownhdfs:hadoop /opt/hadoop-work/name/
    >chownhdfs:hadoop /opt/hadoop-work/data
    >chownmapred:hadoop /opt/hadoop-work/local/
    
    >bin/hadoopfs -chownhdfs:hadoop /
    >bin/hadoopfs -chmod 755 /
    >bin/hadoopfs -mkdir  /mapred
    >bin/hadoopfs -mkdir  /mapred/system/
    >bin/hadoopfs -chownmapred:hadoop /mapred/system
    >bin/hadoopfs -chmod -R 700 /mapred/system
    >bin/hadoopfs -chmod 777 /tmp
    
  11. Install Unlimited Strength Java Cryptography Extension (JCE) Policy Files by downloading the policy files from http://www.oracle.com/technetwork/java/javase/downloads/index.html and copying the JAR files in the distribution to JAVA_HOME/jre/lib/security.
  12. Configure Hadoop properties by adding following properties to the associated configuration files. Replace the HADOOP_HOME value with the corresponding location. Here, Hadoop will replace the _HOST with the localhost name. The following code snippet adds properties to core-site.xml:
    <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
    </property>
    <property>
    <name>hadoop.security.authorization</name>
    <value>true</value>
    </property>
    
  13. Copy the configuration parameters defined in resources/chapter3/kerberos-hdfs-site.xml of the source code for this chapter to the HADOOP_HOME/conf/hdfs-site.xml. Replace the HADOOP_HOME value with the corresponding location. Here Hadoop will replace the _HOST with the localhost name.
  14. Start the NameNode by running the following commands from HADOOP_HOME:
    >sudo -u hdfs bin/hadoopnamenode &
    
  15. Test HDFS setup by doing some metadata operations.
    >kinit hdfs/hadoop.kbrelam.com -k -t conf/hdfs.keytab
    >klist
    >kinit –R
    

    In the first command, we specify the name of the principal (for example, hdfs/hadoop.kbrelam.com) to apply operations to that principal. The first two commands are theoretically sufficient. However, there is a bug that stops Hadoop from reading the credentials. We can work around this by the last command that rewrites the key in more readable format. Now let's run hdfs commands.

    >bin/hadoopfs -ls /
    
  16. Start the DataNode (this must be done as the root) by running following command:
    >su - root
    >cd /opt/hadoop-1.0.3/
    >export HADOOP_SECURE_DN_USER=hdfs
    >export HADOOP_DATANODE_USER=hdfs
    >bin/hadoopdatanode &
    >exit
    
  17. Configure mapred by adding the following code to conf/map-red.xml. Replace HADOOP_HOME with the corresponding location.
    <property>
    <name>mapreduce.jobtracker.kerberos.principal</name>
    <value>mapred/[email protected]</value>
    </property>
    <property>
    <name>mapreduce.jobtracker.kerberos.https.principal</name><value>host/[email protected]</value>
    </property>
    <property>
    <name>mapreduce.jobtracker.keytab.file</name>
    <value>HADOOP_HOME/conf/mapred.keytab</value><!-- path to the MapReducekeytab -->
    </property><!-- TaskTracker security configs -->
    <property>
    <name>mapreduce.tasktracker.kerberos.principal</name>
    <value>mapred/[email protected]</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.kerberos.https.principal</name>
    <value>host/[email protected]</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.keytab.file</name>
    <value>HADOOP_HOME/conf/mapred.keytab</value><!-- path to the MapReducekeytab -->
    </property><!-- TaskController settings -->
    <property>
    <name>mapred.task.tracker.task-controller</name><value>org.apache.hadoop.mapred.LinuxTaskController</value>
    </property>
    <property>
    <name>mapreduce.tasktracker.group</name>
    <value>mapred</value>
    </property>
  18. Configure the Linux task controller, which must be used for Kerberos setup.
    >mkdir /etc/hadoop
    >cpconf/taskcontroller.cfg /etc/hadoop/taskcontroller.cfg
    >chmod 755 /etc/hadoop/taskcontroller.cfg
    
  19. Add the following code to /etc/hadoop/taskcontroller.cfg:
    mapred.local.dir=/opt/hadoop-work/local/
    hadoop.log.dir=HADOOP_HOME/logs
    mapreduce.tasktracker.group=mapred
    banned.users=mapred,hdfs,bin
    min.user.id=1000

    Set up the permissions by running the following command from HADOOP_HOME, and verify that the final permissions of bin/task-controller are rwsr-x---. Otherwise, the jobs will fail to execute.

    >chmod 4750 bin/task-controller
    >ls -l bin/task-controller
    >-rwsr-x--- 1 root mapred 63374 May  9 02:05 bin/task-controller
    
  20. Start the JobTracker and TaskTracker:
    >sudo -u mapred bin/hadoopjobtracker
    

    Wait for the JobTracker to start up and then run the following command:

    >sudo -u mapred bin/hadooptasktracker
    
  21. Run the job by running following commands from HADOOP_HOME. If all commands run successfully, you will see the WordCount output as described in Chapter 1, Getting Hadoop Up and Running in a Cluster.
    >su bob
    >kinit bob/hadoop.kbrelam.com -k -t conf/bob.keytab
    >kinit –R
    >bin/hadoopfs -mkdir /data
    >bin/hadoopfs -mkdir /data/job1
    >bin/hadoopfs -mkdir /data/job1/input
    >bin/hadoopfs -put README.txt /data/job1/input
    
    >bin/hadoop jar hadoop-examples-1.0.3.jar wordcount /data/job1 /data/output
    

How it works...

By running the kinit command, the client would obtain a Kerberos ticket and store it in the filesystem. When we run the command, the client uses the Kerberos ticket to get access to the Hadoop nodes and submit jobs. Hadoop resolves the permission based on the user and group permissions of the Linux users that matches the Kerberos principal.

Hadoop Kerberos security settings have many pitfalls. The two tools that might be useful are as follows:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset