Here are the installation steps:
- Open the terminal and download the binaries using the following command:
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
- Unpack the binaries:
$ tar -zxf spark-2.1.0-bin-hadoop2.7.tgz
- Rename the folder containing the binaries by stripping the version information:
$ sudo mv spark-2.1.0-bin-hadoop2.7 spark
- Move the configuration folder to the /etc folder so that it can be turned into a symbolic link later:
$ sudo mv spark/conf/* /etc/spark
- Create your company-specific installation directory under /opt. As the recipes in this book are tested on the infoobjects sandbox, use infoobjects as the directory name. Create the /opt/infoobjects directory:
$ sudo mkdir -p /opt/infoobjects
- Move the spark directory to /opt/infoobjects, as it's an add-on software package:
$ sudo mv spark /opt/infoobjects/
- Change the permissions of the spark home directory, namely 0755 = user:read-write-execute group:read-execute world:read-execute:
$ sudo chmod -R 755 /opt/infoobjects/spark
- Move to the spark home directory:
$ cd /opt/infoobjects/spark
- Create the symbolic link:
$ sudo ln -s /etc/spark conf
- Append Spark binaries path to PATH in .bashrc:
$ echo "export PATH=$PATH:/opt/infoobjects/spark/bin" >> /home/hduser/.bashrc
- Open a new terminal.
- Create the log directory in /var:
$ sudo mkdir -p /var/log/spark
- Make hduser the owner of Spark's log directory:
$ sudo chown -R hduser:hduser /var/log/spark
- Create Spark's tmp directory:
$ mkdir /tmp/spark
- Configure Spark with the help of the following command lines:
$ cd /etc/spark
$ echo "export HADOOP_CONF_DIR=/opt/infoobjects/hadoop/etc/hadoop" >> spark-env.sh
$ echo "export YARN_CONF_DIR=/opt/infoobjects/hadoop/etc/Hadoop" >> spark-env.sh
$ echo "export SPARK_LOG_DIR=/var/log/spark" >> spark-env.sh
$ echo "export SPARK_WORKER_DIR=/tmp/spark" >> spark-env.sh
- Change the ownership of the spark home directory to root:
$ sudo chown -R root:root /opt/infoobjects/spark