How to do it...

The following are the steps to build the Spark source code with Maven:

  1. Increase MaxPermSize of the heap:
       $ echo "export _JAVA_OPTIONS="-XX:MaxPermSize=1G""  >> 
/home/hduser/.bashrc
  1. Open a new terminal window and download the Spark source code from GitHub:
        $ wget https://github.com/apache/spark/archive/branch-2.1.zip
  1. Unpack the archive:
        $ unzip branch-2.1.zip
  1. Rename unzipped folder to spark:
        $ mv spark-branch-2.1 spark
  1. Move to the spark directory:
        $ cd spark
  1. Compile the sources with the YARN-enabled, Hadoop version 2.7, and Hive-enabled flags and skip the tests for faster compilation:
      $ mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive -
DskipTests clean package
  1. Move the conf folder to the etc folder so that it can be turned into a symbolic link:
        $ sudo mv spark/conf /etc/
  1. Move the spark directory to /opt as it's an add-on software package:
        $ sudo mv spark /opt/infoobjects/spark
  1. Change the ownership of the spark home directory to root:
        $ sudo chown -R root:root /opt/infoobjects/spark
  1. Change the permissions of the spark home directory, namely 0755 = user:rwx group:r-x world:r-x:
        $ sudo chmod -R 755 /opt/infoobjects/spark
  1. Move to the spark home directory:
        $ cd /opt/infoobjects/spark
  1. Create a symbolic link:
        $ sudo ln -s /etc/spark conf
  1. Put the Spark executable in the path by editing .bashrc:
        $ echo "export PATH=$PATH:/opt/infoobjects/spark/bin" >> 
/home/hduser/.bashrc
  1. Create the log directory in /var:
        $ sudo mkdir -p /var/log/spark
  1. Make hduser the owner of Spark's log directory:
        $ sudo chown -R hduser:hduser /var/log/spark
  1. Create Spark's tmp directory:
        $ mkdir /tmp/spark
  1. Configure Spark with the help of the following command lines:
     $ cd /etc/spark
$ echo "export HADOOP_CONF_DIR=/opt/infoobjects/hadoop/etc/hadoop"
>> spark-env.sh

$ echo "export YARN_CONF_DIR=/opt/infoobjects/hadoop/etc/Hadoop"
>> spark-env.sh

$ echo "export SPARK_LOG_DIR=/var/log/spark" >> spark-env.sh
$ echo "export SPARK_WORKER_DIR=/tmp/spark" >> spark-env.sh
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset