Now you master and worker nodes are active and running. This means that you can submit your Spark job to them for computing. However, before that, you need to log in the remote nodes using SSH. For doing so, execute the following command to SSH remote Spark cluster:
$ SPARK_HOME/spark-ec2
--key-pair=<name_of_the_key_pair>
--identity-file=<path_of_the _key_pair>
--region=<region>
--zone=<zone>
login <cluster-name>
For our case, it should be something like the following:
$ SPARK_HOME/spark-ec2
--key-pair=my-key-pair
--identity-file=/usr/local/key/aws-key-pair.pem
--region=eu-west-1
--zone=eu-west-1
login ec2-spark-cluster-1
Now copy your application, that is, JAR file (or python/R script) to the remote instance (that is, ec2-52-48-119-121.eu-west-1.compute.amazonaws.com in our case) by executing the following command (in a new terminal):
$ scp -i /usr/local/key/aws-key-pair.pem /usr/local/code/KMeans-0.0.1-SNAPSHOT-jar-with-dependencies.jar [email protected]:/home/ec2-user/
Then you need to copy your data (/usr/local/data/Saratoga_NY_Homes.txt, in our case) to the same remote instance by executing the following command:
$ scp -i /usr/local/key/aws-key-pair.pem /usr/local/data/Saratoga_NY_Homes.txt [email protected]:/home/ec2-user/
Well done! You are almost done! Now, finally, you will have to submit your Spark job to be computed by the slaves or worker nodes. To do so, just execute the following commands:
$SPARK_HOME/bin/spark-submit
--class com.chapter13.Clustering.KMeansDemo
--master spark://ec2-52-48-119-121.eu-west-1.compute.amazonaws.com:7077
file:///home/ec2-user/KMeans-0.0.1-SNAPSHOT-jar-with-dependencies.jar
file:///home/ec2-user/Saratoga_NY_Homes.txt
If you have already put your data on HDFS, you should issue the submit command something like following:
$SPARK_HOME/bin/spark-submit
--class com.chapter13.Clustering.KMeansDemo
--master spark://ec2-52-48-119-121.eu-west-1.compute.amazonaws.com:7077
hdfs://localhost:9000/KMeans-0.0.1-SNAPSHOT-jar-with-dependencies.jar
hdfs://localhost:9000//Saratoga_NY_Homes.txt
Upon successful completion of the job computation, you are supposed to see the status and related statistics of your job at port 8080.