Using Apache Whirr to deploy an Apache HBase cluster in a cloud environment

Apache Whirr provides a cloud vendor neutral set of libraries to access the cloud resources. In this recipe, we deploy an Apache HBase cluster on Amazon EC2 cloud using Apache Whirr.

Getting ready

Follow steps 1 to 5 of the Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment recipe.

How to do it...

The following are the steps to deploy a HBase cluster on Amazon EC2 cloud using Apache Whirr.

  1. Copy the following to a file named hbase.properties. If you provided a customs name for your key-pair in step 5 of the Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment recipe, change the whirr.private-key-file and the whirr.public-key-file property values to the paths of the private key and the public key you generated. A sample hbase.properties file is provided in the resources/whirr directory of the chapter resources.
    whirr.cluster-name=whirrhbase
    whirr.instance-templates=1 zookeeper+hadoop-namenode+hadoop-  jobtracker+hbase-master,2 hadoop-datanode+hadoop-  tasktracker+hbase-regionserver
    whirr.provider=aws-ec2
    whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
    whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
  2. Execute the following command in the Whirr home directory to launch your HBase cluster on EC2. After provisioning the cluster, HBase prints out the commands that we can use to log in to the cluster instances. Note them down for the next steps.
    >bin/whirr launch-cluster --config hbase.properties
    ………
    You can log into instances using the following ssh commands:
    ''ssh -i ~/.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]''
    ''ssh -i ~//.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]''
    

    Note

    The traffic from outside to the provisioned EC2 HBase cluster needs to be routed through the master node. Whirr generates a script that we can use to start a proxy for this purpose. The script can be found in a subdirectory named after your HBase cluster inside the ~/.whirr directory. It will take few minutes for Whirr to provision the cluster and to generate this script. Execute this script in a new terminal to start the proxy.

    >cd ~/.whirr/whirrhadoopcluster/
    >hbase-proxy.sh
    

    Whirr also generates hbase-site.xml for your cluster in the ~/.whirr/<your cluster name> directory, which we can use in combination with the above proxy to connect to the HBase cluster from the local client machine. However, currently a Whirr bug (https://issues.apache.org/jira/browse/WHIRR-383) prevents us from accessing HBase shell from our local client machine. Hence in this recipe, we directly log in to the master node of the HBase cluster.

  3. Log in to an instance of your cluster using a command you note down in step 2.
    >ssh -i ~/.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]
    
  4. Go to the /usr/local/hbase-<your-version> directory in the instance or add the /usr/local/hbase-<your-version> /bin to the PATH variable of the instance.
    >cd /usr/local/hbase-0.90.3
    
  5. Start the HBase shell. Execute the following commands to test your HBase installation.
    >bin/hbase shell
    HBase Shell; .....
    Version 0.90.3, r1100350, Sat May  7 13:31:12 PDT 2011
    
    hbase(main):001:0> create ''test'',''cf''
    0 row(s) in 5.9160 seconds
    
    hbase(main):007:0> put ''test'',''row1'',''cf:a'',''value1''
    0 row(s) in 0.6190 seconds
    
    hbase(main):008:0> scan ''test''                     
    ROW                          COLUMN+CELL                                                                      
     row1                        column=cf:a, timestamp=1346893759876, value=value1                               
    1 row(s) in 0.0430 seconds
    
    hbase(main):009:0> quit
    
  6. Issue the following command to shut down the Hadoop cluster. Make sure to download any important data before shutting down the cluster, as the data will be permanently lost after shutting down the cluster.
    >bin/whirr destroy-cluster --config hadoop.properties
    

How it works...

This section describes the whirr.instance-templates property we used in the hbase.properties file. Refer to the Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment recipe for descriptions of the other properties.

whirr.instance-templates=1 zookeeper+hadoop-namenode+hadoop-  jobtracker+hbase-master,2 hadoop-datanode+hadoop-  tasktracker+hbase-regionserver

This property specifies the number of instances to be used for each set of roles and the type of roles for the instances. In the preceding example, one EC2 small instance is used with roles hbase-master, zookeeper, hadoop-jobtracker, and the hadoop-namenode. Another two EC2 small instances are used with roles hbase-regionserver, hadoop-datanode, and hadoop-tasktracker in each instance.

More details on Whirr configuration can be found on http://whirr.apache.org/docs/0.6.0/configuration-guide.html.

See also

  • The Installing HBase recipe of Chapter 5, Hadoop Eco-System and the Deploying an Apache HBase Cluster on Amazon EC2 cloud using EMR and the Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment recipes in this chapter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset