How to do it...

Here are the installation steps:

Open the terminal and download the binaries using the following command:

        $ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

Unpack the binaries:

        $ tar -zxf spark-2.1.0-bin-hadoop2.7.tgz

Rename the folder containing the binaries by stripping the version information:

        $ sudo mv spark-2.1.0-bin-hadoop2.7 spark

Move the configuration folder to the /etc folder so that it can be turned into a symbolic link later:

        $ sudo mv spark/conf/* /etc/spark

Create your company-specific installation directory under /opt. As the recipes in this book are tested on the infoobjects sandbox, use infoobjects as the directory name. Create the /opt/infoobjects directory:

        $ sudo mkdir -p /opt/infoobjects

Move the spark directory to /opt/infoobjects, as it's an add-on software package:

        $ sudo mv spark /opt/infoobjects/

Change the permissions of the spark home directory, namely 0755 = user:read-write-execute group:read-execute world:read-execute:

        $ sudo chmod -R 755 /opt/infoobjects/spark

Move to the spark home directory:

        $ cd /opt/infoobjects/spark

Create the symbolic link:

        $ sudo ln -s /etc/spark conf

Append Spark binaries path to PATH in .bashrc:

        $ echo "export PATH=$PATH:/opt/infoobjects/spark/bin" >> /home/hduser/.bashrc

Open a new terminal.
Create the log directory in /var:

        $ sudo mkdir -p /var/log/spark

Make hduser the owner of Spark's log directory:

        $ sudo chown -R hduser:hduser /var/log/spark

Create Spark's tmp directory:

        $ mkdir /tmp/spark

Configure Spark with the help of the following command lines:

     $ cd /etc/spark
     $ echo "export HADOOP_CONF_DIR=/opt/infoobjects/hadoop/etc/hadoop" >> spark-env.sh
     $ echo "export YARN_CONF_DIR=/opt/infoobjects/hadoop/etc/Hadoop" >> spark-env.sh
     $ echo "export SPARK_LOG_DIR=/var/log/spark" >> spark-env.sh
     $ echo "export SPARK_WORKER_DIR=/tmp/spark" >> spark-env.sh

Change the ownership of the spark home directory to root:

        $ sudo chown -R root:root /opt/infoobjects/spark

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...