How to do it...

The following are the steps to build the Spark source code with Maven:

Increase MaxPermSize of the heap:

       $ echo "export _JAVA_OPTIONS="-XX:MaxPermSize=1G""  >> 
         /home/hduser/.bashrc

Open a new terminal window and download the Spark source code from GitHub:

        $ wget https://github.com/apache/spark/archive/branch-2.1.zip

Unpack the archive:

        $ unzip branch-2.1.zip

Rename unzipped folder to spark:

        $ mv spark-branch-2.1 spark

Move to the spark directory:

        $ cd spark

Compile the sources with the YARN-enabled, Hadoop version 2.7, and Hive-enabled flags and skip the tests for faster compilation:

      $ mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive -
        DskipTests clean package

Move the conf folder to the etc folder so that it can be turned into a symbolic link:

        $ sudo mv spark/conf /etc/

Move the spark directory to /opt as it's an add-on software package:

        $ sudo mv spark /opt/infoobjects/spark

Change the ownership of the spark home directory to root:

        $ sudo chown -R root:root /opt/infoobjects/spark

Change the permissions of the spark home directory, namely 0755 = user:rwx group:r-x world:r-x:

        $ sudo chmod -R 755 /opt/infoobjects/spark

Move to the spark home directory:

        $ cd /opt/infoobjects/spark

Create a symbolic link:

        $ sudo ln -s /etc/spark conf

Put the Spark executable in the path by editing .bashrc:

        $ echo "export PATH=$PATH:/opt/infoobjects/spark/bin" >> 
          /home/hduser/.bashrc

Create the log directory in /var:

        $ sudo mkdir -p /var/log/spark

Make hduser the owner of Spark's log directory:

        $ sudo chown -R hduser:hduser /var/log/spark

Create Spark's tmp directory:

        $ mkdir /tmp/spark

Configure Spark with the help of the following command lines:

     $ cd /etc/spark
$ echo "export HADOOP_CONF_DIR=/opt/infoobjects/hadoop/etc/hadoop" 
       >> spark-env.sh
$ echo "export YARN_CONF_DIR=/opt/infoobjects/hadoop/etc/Hadoop" 
       >> spark-env.sh
$ echo "export SPARK_LOG_DIR=/var/log/spark" >> spark-env.sh
$ echo "export SPARK_WORKER_DIR=/tmp/spark" >> spark-env.sh

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...