Deploying Apache HBase on a Hadoop cluster

In this recipe, we are going to deploy Apache HBase 0.90.x on top of an Apache Hadoop 1.0.x cluster. This is required for using Apache Nutch with a Hadoop MapReduce cluster.

Getting ready

We assume you already have your Hadoop cluster (version 1.0.x) deployed. If not, refer to the Setting Hadoop in a distributed cluster environment recipe of Chapter 1, Getting Hadoop up and running in a Cluster, to configure and deploy a Hadoop cluster.

How to do it

The following steps show you how to deploy a distributed Apache HBase cluster on top of an Apache Hadoop cluster:

  1. Download and install Apache HBase from http://hbase.apache.org/. Apache Nutch 4.1 and Apache Gora 0.2 recommend HBase 0.90.4 or the later versions of the 0.90.x branch.
  2. Remove the hadoop-core-*.jar in the $HBASE_HOME/lib. Copy the hadoop-core-*.jar and the commons-configuration*.jar from your Hadoop deployment to the $HBASE_HOME/lib folder.
    > rm lib/hadoop-core-<version>.jar
    > cp ~/Software/hadoop-1.0.4/hadoop-core-1.0.4.jar ../lib/
    > cp ~/Software/hadoop-1.0.4/lib/commons-configuration-1.6.jar ../lib/
    
  3. Configure the $HBASE_HOME/conf/hbase-site.xml.
    <configuration>
      <property>
        <name>hbase.rootdir</name>
        <value>hdfs://xxx.xx.xx.xxx:9000/hbase</value>
      </property>
      <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
      </property>
      <property>
        <name>hbase.zookeeper.quorum</name>
        <value>localhost</value>
      </property>
    </configuration>
  4. Go to the $HBASE_HOME and start HBase.
    > bin/start-hbase.sh
    
  5. Open the HBase UI at http://localhost:60010 and monitor the HBase installation.
  6. Start the HBase shell and execute the following commands to test the HBase deployment. If the preceding command fails, check the logs in the $HBASE_HOME/logs directory to identify the exact issue.
    > bin/hbase shell
    hbase(main):001:0> create 'test', 'cf'
    0 row(s) in 1.8630 seconds
    
    hbase(main):002:0> list 'test'
    TABLE                                                                                                                 
    test                                                                                                                  
    1 row(s) in 0.0180 seconds
    

Note

Hbase is very sensitive to the contents of the /etc/hosts file. Fixing the /etc/host file would solve most of the HBase deployment errors.

How it works...

The preceding steps configure and run the Apache HBase in the distributed mode. HBase distributed mode stores the actual data of the HBase tables in the HDFS, taking advantage of the distributed and fault tolerant nature of HDFS.

In order to run HBase in the distributed mode, we have to configure the HDFS NameNode and the path to store the HBase data using the hbase.rootdir property in the hbase-site.xml.

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://<namenode>:<port>/<path></value>
  </property>

We also have to set the hbase.cluster.distributed property to true.

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

See also

  • The Installing HBase recipe of Chapter 5, Hadoop Ecosystem.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset