Apache Whirr provides a cloud vendor neutral set of libraries to access the cloud resources. In this recipe, we deploy an Apache HBase cluster on Amazon EC2 cloud using Apache Whirr.
Follow steps 1 to 5 of the Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment recipe.
The following are the steps to deploy a HBase cluster on Amazon EC2 cloud using Apache Whirr.
hbase.properties
. If you provided a customs name for your key-pair in step 5 of the Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment recipe, change the whirr.private-key-file
and the whirr.public-key-file
property values to the paths of the private key and the public key you generated. A sample hbase.properties
file is provided in the resources/whirr
directory of the chapter resources.whirr.cluster-name=whirrhbase whirr.instance-templates=1 zookeeper+hadoop-namenode+hadoop- jobtracker+hbase-master,2 hadoop-datanode+hadoop- tasktracker+hbase-regionserver whirr.provider=aws-ec2 whirr.private-key-file=${sys:user.home}/.ssh/id_rsa whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
>bin/whirr launch-cluster --config hbase.properties ……… You can log into instances using the following ssh commands: ''ssh -i ~/.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]'' ''ssh -i ~//.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]''
The traffic from outside to the provisioned EC2 HBase cluster needs to be routed through the master node. Whirr generates a script that we can use to start a proxy for this purpose. The script can be found in a subdirectory named after your HBase cluster inside the ~/.whirr
directory. It will take few minutes for Whirr to provision the cluster and to generate this script. Execute this script in a new terminal to start the proxy.
>cd ~/.whirr/whirrhadoopcluster/ >hbase-proxy.sh
Whirr also generates hbase-site.xml
for your cluster in the ~/.whirr/<your cluster name>
directory, which we can use in combination with the above proxy to connect to the HBase cluster from the local client machine. However, currently a Whirr bug (https://issues.apache.org/jira/browse/WHIRR-383) prevents us from accessing HBase shell from our local client machine. Hence in this recipe, we directly log in to the master node of the HBase cluster.
>ssh -i ~/.ssh/id_rsa -o "UserKnownHostsFile /dev/null" -o StrictHostKeyChecking=no [email protected]
/usr/local/hbase-<your-version>
directory in the instance or add the /usr/local/hbase-<your-version> /bin
to the PATH
variable of the instance.>cd /usr/local/hbase-0.90.3
>bin/hbase shell HBase Shell; ..... Version 0.90.3, r1100350, Sat May 7 13:31:12 PDT 2011 hbase(main):001:0> create ''test'',''cf'' 0 row(s) in 5.9160 seconds hbase(main):007:0> put ''test'',''row1'',''cf:a'',''value1'' 0 row(s) in 0.6190 seconds hbase(main):008:0> scan ''test'' ROW COLUMN+CELL row1 column=cf:a, timestamp=1346893759876, value=value1 1 row(s) in 0.0430 seconds hbase(main):009:0> quit
>bin/whirr destroy-cluster --config hadoop.properties
This section describes the whirr.instance-templates
property we used in the hbase.properties
file. Refer to the Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment recipe for descriptions of the other properties.
whirr.instance-templates=1 zookeeper+hadoop-namenode+hadoop- jobtracker+hbase-master,2 hadoop-datanode+hadoop- tasktracker+hbase-regionserver
This property specifies the number of instances to be used for each set of roles and the type of roles for the instances. In the preceding example, one EC2 small instance is used with roles hbase-master
, zookeeper
, hadoop-jobtracker
, and the hadoop-namenode
. Another two EC2 small instances are used with roles hbase-regionserver
, hadoop-datanode
, and hadoop-tasktracker
in each instance.
More details on Whirr configuration can be found on http://whirr.apache.org/docs/0.6.0/configuration-guide.html
.