In this recipe, we are going to deploy Apache HBase 0.90.x on top of an Apache Hadoop 1.0.x cluster. This is required for using Apache Nutch with a Hadoop MapReduce cluster.
We assume you already have your Hadoop cluster (version 1.0.x) deployed. If not, refer to the Setting Hadoop in a distributed cluster environment recipe of Chapter 1, Getting Hadoop up and running in a Cluster, to configure and deploy a Hadoop cluster.
The following steps show you how to deploy a distributed Apache HBase cluster on top of an Apache Hadoop cluster:
hadoop-core-*.jar
in the $HBASE_HOME/lib
. Copy the hadoop-core-*.jar
and the commons-configuration*.jar
from your Hadoop deployment to the $HBASE_HOME/lib
folder.> rm lib/hadoop-core-<version>.jar > cp ~/Software/hadoop-1.0.4/hadoop-core-1.0.4.jar ../lib/ > cp ~/Software/hadoop-1.0.4/lib/commons-configuration-1.6.jar ../lib/
$HBASE_HOME/conf/hbase-site.xml
.<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://xxx.xx.xx.xxx:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>localhost</value> </property> </configuration>
$HBASE_HOME
and start HBase.> bin/start-hbase.sh
http://localhost:60010
and monitor the HBase installation.$HBASE_HOME/logs
directory to identify the exact issue.> bin/hbase shell hbase(main):001:0> create 'test', 'cf' 0 row(s) in 1.8630 seconds hbase(main):002:0> list 'test' TABLE test 1 row(s) in 0.0180 seconds
The preceding steps configure and run the Apache HBase in the distributed mode. HBase distributed mode stores the actual data of the HBase tables in the HDFS, taking advantage of the distributed and fault tolerant nature of HDFS.
In order to run HBase in the distributed mode, we have to configure the HDFS NameNode and the path to store the HBase data using the hbase.rootdir
property in the hbase-site.xml
.
<property> <name>hbase.rootdir</name> <value>hdfs://<namenode>:<port>/<path></value> </property>
We also have to set the hbase.cluster.distributed
property to true
.
<property> <name>hbase.cluster.distributed</name> <value>true</value> </property>