How to do it...

  1. Since Spark's standalone mode is the default, all you need to do is have Spark binaries installed on both master and slave machines. Put /opt/infoobjects/spark/sbin in the path on every node:
        $ echo "export PATH=$PATH:/opt/infoobjects/spark/sbin" >> /home/hduser/.bashrc
  1. Start the standalone master server (SSH to master first):
        [email protected]~] start-master.sh
Master, by default, starts on port 7077, which slaves use to connect to it. It also has a web UI at port 8088.
  1. Connect to the master node using a Secure Shell (SSH) connection and then start the slaves:
        [email protected]~] spark-class org.apache.spark.deploy.worker.Worker 
spark://m1.zettabytes.com:7077
Argument 
Meaning

-h <ipaddress/HOST> and--host <ipaddress/HOST>

IP address/DNS service to listen on
-p <port> and --port <port> Port for the service to listen on
--webui-port <port> This is the port for the web UI (by default, 8080 is for the master and 8081 for the worker)
-c <cores> and --cores <cores> These refer to the total CPU core Spark applications that can be used on a machine (worker only)
-m <memory> and --memory <memory> These refer to the total RAM Spark applications that can be used on a machine (worker only)
-d <dir> and --work-dir <dir> These refer to the directory to use for scratch space and job output logs

For fine-grained configuration, the above parameters work with both master and slaves. Rather than manually starting master and slave daemons on each node, it can also be accomplished using cluster launch scripts. Cluster launch scripts are outside the scope of this book. Please refer to books about Chef or Puppet.

  1. First, create the conf/slaves file on a master node and add one line per slave hostname (using an example of five slave nodes, replace the following slave DNS with the DNS of the slave nodes in your cluster):
        [email protected]~] echo "s1.zettabytes.com" >> conf/slaves
[email protected]~] echo "s2.zettabytes.com" >> conf/slaves
[email protected]~] echo "s3.zettabytes.com" >> conf/slaves
[email protected]~] echo "s4.zettabytes.com" >> conf/slaves
[email protected]~] echo "s5.zettabytes.com" >> conf/slaves

Once the slave machine is set up, you can call the following scripts to start/stop the cluster:

Script name Purpose
start-master.sh Starts a master instance on the host machine
start-slaves.sh Starts a slave instance on each node of the slaves file
start-all.sh Starts both the master and slaves
stop-master.sh Stops the master instance on the host machine
stop-slaves.sh Stops the slave instance on all the nodes of the slaves file
stop-all.sh Stops both the master and slaves
  1. Connect an application to the cluster through Scala code:
        val sparkContext = new SparkContext(new 
SparkConf().setMaster("spark://m1.zettabytes.com:7077")Setting master URL for
spark-shell
  1. Connect to the cluster through Spark shell:
        $ spark-shell --master spark://master:7077
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset