How to do it...

Since Spark's standalone mode is the default, all you need to do is have Spark binaries installed on both master and slave machines. Put /opt/infoobjects/spark/sbin in the path on every node:

        $ echo "export PATH=$PATH:/opt/infoobjects/spark/sbin" >> /home/hduser/.bashrc

Start the standalone master server (SSH to master first):

        [email protected]~] start-master.sh

Master, by default, starts on port 7077, which slaves use to connect to it. It also has a web UI at port 8088.

Connect to the master node using a Secure Shell (SSH) connection and then start the slaves:

        [email protected]~] spark-class org.apache.spark.deploy.worker.Worker 
          spark://m1.zettabytes.com:7077

Argument	Meaning
`-h <ipaddress/HOST>` and`--host <ipaddress/HOST>`	IP address/DNS service to listen on
`-p <port>` and `--port <port>`	Port for the service to listen on
`--webui-port <port>`	This is the port for the web UI (by default, 8080 is for the master and 8081 for the worker)
`-c <cores>` and `--cores <cores>`	These refer to the total CPU core Spark applications that can be used on a machine (worker only)
`-m <memory>` and `--memory <memory>`	These refer to the total RAM Spark applications that can be used on a machine (worker only)
`-d <dir>` and `--work-dir <dir>`	These refer to the directory to use for scratch space and job output logs

For fine-grained configuration, the above parameters work with both master and slaves. Rather than manually starting master and slave daemons on each node, it can also be accomplished using cluster launch scripts. Cluster launch scripts are outside the scope of this book. Please refer to books about Chef or Puppet.

First, create the conf/slaves file on a master node and add one line per slave hostname (using an example of five slave nodes, replace the following slave DNS with the DNS of the slave nodes in your cluster):

        [email protected]~] echo "s1.zettabytes.com" >> conf/slaves
[email protected]~] echo "s2.zettabytes.com" >> conf/slaves
[email protected]~] echo "s3.zettabytes.com" >> conf/slaves
[email protected]~] echo "s4.zettabytes.com" >> conf/slaves
[email protected]~] echo "s5.zettabytes.com" >> conf/slaves

Once the slave machine is set up, you can call the following scripts to start/stop the cluster:

Script name	Purpose
`start-master.sh`	Starts a master instance on the host machine
`start-slaves.sh`	Starts a slave instance on each node of the slaves file
`start-all.sh`	Starts both the master and slaves
`stop-master.sh`	Stops the master instance on the host machine
`stop-slaves.sh`	Stops the slave instance on all the nodes of the slaves file
`stop-all.sh`	Stops both the master and slaves

Connect an application to the cluster through Scala code:

        val sparkContext = new SparkContext(new 
          SparkConf().setMaster("spark://m1.zettabytes.com:7077")Setting master URL for 
            spark-shell

Connect to the cluster through Spark shell:

        $ spark-shell --master spark://master:7077

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...