Automating the generation of clusters with cfncluster

As we just learned, we can spawn machines using the web interface, but it quickly becomes tedious and error prone. Fortunately, Amazon has an API. This means that we can write scripts that perform all the operations we discussed earlier, automatically. Even better, others have already developed tools that can be used to mechanize and automate many of the processes you want to perform with AWS.

Amazon themselves provide many command-line tools for their own infrastructure. For cluster provision, the tool is called cfncluster. If you are using conda, you can install it with:

$ conda install cfncluster

You can run this from your local machine: it will use the Amazon API.

The first step is to go back to the AWS Console in your web browser and add the AdministratorAccess permission to your AWS user. This is a brute-force approach; it gives the user all administration rights, and while it is useful when learning about AWS, it is not recommended for production use:

Now, we create a new virtual private cloud on AWS using the VPC service. Choose all the default options:

$ cfncluster configure  

Pick from the options listed. It is important that you choose the right key (which we generated earlier). This will generate a configuration file in ~/.cfncluster/config. For the moment, we will use all the defaults, but this is where you can later change them to suit your needs.

Keys, keys, and more keys:
There are three completely different types of keys that are important when dealing with AWS. First, there is a standard username/password combination, which you use to log in to the website. Second, there is the SSH key system, which is a public/private key system implemented with files; with your public key file, you can log in to remote machines. Third, there is the AWS access key/secret key system, which is just a form of username/password that allows you to have multiple users on the same account (including adding different permissions to each one, but we will not cover these advanced features in this book). To look up our access/secret keys, we go back to the AWS Console, click on our name in the top right, and select Security Credentials. Now, at the bottom of the screen, we should see our access key, which may look something like this: AAKIIT7HHF6IUSN3OCAA.

We can now create the cluster:

$ cfncluster create public  

This may take a few minutes. This will allocate two compute nodes to our cluster and a main master node (these are default values; you can change them in the cfncluster configuration file). Once the process is finished, you should see output:

Output:"MasterPublicIP"="52.86.118.172"      
Output:"MasterPrivateIP"="172.30.2.146"
Output:"GangliaPublicURL"="http://52.86.118.172/ganglia/"
Output:"GangliaPrivateURL"="
http://172.30.2.146/ganglia/"

Note the IP address of the master node that is printed out (in this case 52.86.118.172). If you forget it, you can look it up again with the command:

$ cfncluster status public  

All of these nodes have the same filesystem, so anything we create on the master node will also be seen by the worker nodes. This also means that we can use jug on these clusters.

These clusters can be used as you wish, but they come equipped with a job queue engine, which makes them ideal for batch processing. The process of using them is simple:

  1. You log into the master node.
  2. You prepare your scripts on the master (or better yet, have them prepared beforehand).
  3. You submit jobs to the queue. A job can be any Unix command. The scheduler will find free nodes and run your job.
  1. You wait for the jobs to finish.
  2. You read the results on the master node. You can also now kill all the slave nodes to save money. In any case, do not leave your system running when you do not need it anymore! Otherwise, this will cost you (in dollars and cents).

As we said earlier, cfncluster provides a batch queuing system for its clusters; you write a script to perform your actions, put it on the queue, and it will run in any available node.

As before, we use our key to log into the master node:

$ ssh -i awskeys.pem [email protected]  

Set up miniconda as before (the setup—aws.txt file in the code repository has all the necessary commands). We can use the same jugfile system as before, except that now, instead of running it directly on the master, we schedule it on the cluster:

  1. First, write a very simple wrapper script as follows:
#!/usr/bin/env bash
export PATH=$HOME/miniconda3/bin:$PATH
source activate py3.6
jug execute jugfile.py
  1. Call it run-jugfile.sh and use chmod +x run-jugfile.sh to give it executable permission. Now, we can schedule jobs on the cluster by using the following command:
$ qsub -cwd ./run-jugfile.sh  

This will create two jobs, each of which will run the run-jugfile.sh script, which we will simply call jug. You can still use the master as you wish. In particular, you can, at any moment, run jug status and see the status of the computation. In fact, jug was developed in exactly such an environment, so it works very well in it.

  1. Eventually, the computation will finish. At this point, we need to first save our results. Then, we can kill off all the nodes. We create a directory, ~/results, and copy our results there:
# mkdir ~/results
# cp results.image.txt ~/results  
  1. Now, log off the cluster and go back to our worker machine:
# exit  
  1. Now we are back at our original AWS machine or your local computer (notice the $ sign in the next code examples):
$ scp -i awskeys.pem -pr ec2-user@52.86.118.172:results . 
  1. Finally, we should kill all the nodes to save money as follows:
$ cfncluster stop public
Stopping the cluster will destroy the compute nodes, but keep the master node running as well as the disk space. This reduces costs to a minimum, but to really destroy all 
$ cfncluster delete public  
Terminating will really destroy the filesystem and all your results. In our case, we have copied the final results to safety manually. Another possibility is to have the cluster write to a filesystem, which is not allocated and destroyed by cfncluster, but is available to you on a regular instance; in fact, the flexibility of these tools is immense. However, these advanced manipulations cannot all fit in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset