Understanding HBase

Apache HBase is a distributed Big Data store for Hadoop. This allows random, real-time, read/write access to Big Data. This is designed as a column-oriented, data-storage model, innovated after being inspired by Google Big table.

Understanding HBase features

Following are the features for HBase:

  • RESTful web service with XML
  • Linear and modular scalability
  • Strict consistent reads and writes
  • Extensible shell
  • Block cache and Bloom filters for real-time queries

Pre-requisites for RHBase are as follows:

  • Hadoop
  • HBase
  • Thrift

Here we assume that users have already configured Hadoop for their Linux machine. If anyone wishes to know how to install Hadoop on Linux, please refer to Chapter 1, Getting Ready to Use R and Hadoop.

Installing HBase

Following are the steps for installing HBase:

  1. Download the tar file of HBase and extract it:
    wget http://apache.cs.utah.edu/hbase/stable/hbase-0.94.11.tar.gz
    
    tar -xzf hbase-0.94.11.tar.gz
    
  2. Go to HBase installation directory and update the configuration files:
    cd hbase-0.94.11/
    
    vi conf/hbase-site.xml
    
  3. Modify the configuration files:
    1. Update hbase-env.sh.
      ~ Vi conf / hbase-env.sh 
      
    2. Set up the configuration for HBase:
        export JAVA_HOME = /usr/lib/jvm/java-6-sun
        export HBASE_HOME = /usr/local/hbase-0.94.11
        export HADOOP_INSTALL = /usr/local/hadoop
        export HBASE_CLASSPATH = /usr/local/hadoop/conf
        export HBASE_MANAGES_ZK = true
      
    3. Update hbase-site.xmlzxml:
      Vi conf / hbase-site.xml
      
    4. Change hbase-site.cml, which should look like the following code:
          <configuration>
            <property>
              <name> hbase.rootdir </name>
              <value> hdfs://master:9000/hbase </value>
            </Property>
      
            <property>
              <name>hbase.cluster.distributed </name>
              <value>true</value>
            </Property>
      
            <property>
               <name>dfs.replication </name>
               <value>1</value>
            </Property>
      
            <property>
              <name>hbase.zookeeper.quorum </name>
              <value>master</value>
            </Property>
      
            <property>
                <name>hbase.zookeeper.property.clientPort </name>
                <value>2181</value>
            </Property>
      
            <property>
              <name>hbase.zookeeper.property.dataDir </name>
              <value>/root/hadoop/hdata</​​value>
            </Property>
          </ Configuration>

      Tip

      If a separate zookeper setup is used, the configuration needs to be changed.

    5. Copy the Hadoop environment configuration files and libraries.
      Cp $HADOOP_HOME/conf/hdfs-site.xml $HBASE_HOME/conf
      Cp $HADOOP_HOME/hadoop-core-1.0.3.jar $HBASE_HOME/lib
      Cp $HADOOP_HOME/lib/commons-configuration-1.6.jar $HBASE_HOME/lib
      Cp $HADOOP_HOME/lib/commons-collections-3.2.1.jar $HBASE_HOME/lib
      

Installing thrift

Following are the steps for installing thrift:

  1. Download the thrift source from the Internet and place it to client. We will do it with Ubuntu O.S 12.04:
    get http://archive.apache.org/dist/thrift/0.8.0/thrift-0.8.0.tar.gz
    
  2. To extract the downloaded .tar.gz file, use the following command:
    tar xzvf thrift-0.8.0.tar.gz
    cd thrift-0.8.0/
    
  3. Compile the configuration parameters:
    ./Configure
    
  4. Install thrift:
    Make
    Make install
    

    Tip

    To start the HBase thrift server we need to call the following command:

    $HBASE_HOME/bin/hbase-daemon.sh start
    

Installing RHBase

After installing HBase , we will see how to get the RHBase library.

  • To install rhbase we use the following command:
    wget https://github.com/RevolutionAnalytics/rhbase/blob/master/build/rhbase_1.2.0.tar.gz
    
  • To install the downloaded package we use the following command:
    R CMD INSTALL rhbase_1.2.0.tar.gz
    

Importing the data into R

Once RHBase is installed, we can load the dataset in R from HBase with the help of RHBase:

  • To list all tables we use:
    hb.list.tables ()
    
  • To create a new table we use:
    hb.new.table ("student")
    
  • To display the table structure we use:
    hb.describe.table("student_rhbase")
    
  • To read data we use:
    hb.get ('student_rhbase', 'mary')
    

Understanding data manipulation

Now, we will see how to operate over the dataset of HBase from within R:

  • To create the table we use:
    hb.new.table ("student_rhbase", "info")
    
  • To insert the data we use:
    hb.insert ("student_rhbase", list (list ("mary", "info: age", "24")))
    
  • To delete a sheet we use:
    hb.delete.table ('student_rhbase')
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset