Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Understanding HBase

Apache HBase is a distributed Big Data store for Hadoop. This allows random, real-time, read/write access to Big Data. This is designed as a column-oriented, data-storage model, innovated after being inspired by Google Big table.

Understanding HBase features

Following are the features for HBase:

RESTful web service with XML
Linear and modular scalability
Strict consistent reads and writes
Extensible shell
Block cache and Bloom filters for real-time queries

Pre-requisites for RHBase are as follows:

Hadoop
HBase
Thrift

Here we assume that users have already configured Hadoop for their Linux machine. If anyone wishes to know how to install Hadoop on Linux, please refer to Chapter 1, Getting Ready to Use R and Hadoop.

Installing HBase

Following are the steps for installing HBase:

Download the tar file of HBase and extract it:

wget http://apache.cs.utah.edu/hbase/stable/hbase-0.94.11.tar.gz

tar -xzf hbase-0.94.11.tar.gz

Go to HBase installation directory and update the configuration files:
```
cd hbase-0.94.11/

vi conf/hbase-site.xml
```

Modify the configuration files:

Update hbase-env.sh.
```
~ Vi conf / hbase-env.sh 
```

Set up the configuration for HBase:

  export JAVA_HOME = /usr/lib/jvm/java-6-sun
  export HBASE_HOME = /usr/local/hbase-0.94.11
  export HADOOP_INSTALL = /usr/local/hadoop
  export HBASE_CLASSPATH = /usr/local/hadoop/conf
  export HBASE_MANAGES_ZK = true

Update hbase-site.xmlzxml:
```
Vi conf / hbase-site.xml
```

Change hbase-site.cml, which should look like the following code:

    <configuration>
      <property>
        <name> hbase.rootdir </name>
        <value> hdfs://master:9000/hbase </value>
      </Property>

      <property>
        <name>hbase.cluster.distributed </name>
        <value>true</value>
      </Property>

      <property>
         <name>dfs.replication </name>
         <value>1</value>
      </Property>

      <property>
        <name>hbase.zookeeper.quorum </name>
        <value>master</value>
      </Property>

      <property>
          <name>hbase.zookeeper.property.clientPort </name>
          <value>2181</value>
      </Property>

      <property>
        <name>hbase.zookeeper.property.dataDir </name>
        <value>/root/hadoop/hdata</value>
      </Property>
    </ Configuration>

Tip

If a separate zookeper setup is used, the configuration needs to be changed.

Copy the Hadoop environment configuration files and libraries.

Cp $HADOOP_HOME/conf/hdfs-site.xml $HBASE_HOME/conf
Cp $HADOOP_HOME/hadoop-core-1.0.3.jar $HBASE_HOME/lib
Cp $HADOOP_HOME/lib/commons-configuration-1.6.jar $HBASE_HOME/lib
Cp $HADOOP_HOME/lib/commons-collections-3.2.1.jar $HBASE_HOME/lib

Installing thrift

Following are the steps for installing thrift:

Download the thrift source from the Internet and place it to client. We will do it with Ubuntu O.S 12.04:
```
get http://archive.apache.org/dist/thrift/0.8.0/thrift-0.8.0.tar.gz
```
To extract the downloaded .tar.gz file, use the following command:
```
tar xzvf thrift-0.8.0.tar.gz
cd thrift-0.8.0/
```
Compile the configuration parameters:
```
./Configure
```
Install thrift:
```
Make
Make install
```
Tip
To start the HBase thrift server we need to call the following command:
```
$HBASE_HOME/bin/hbase-daemon.sh start
```

Installing RHBase

After installing HBase , we will see how to get the RHBase library.

To install rhbase we use the following command:

wget https://github.com/RevolutionAnalytics/rhbase/blob/master/build/rhbase_1.2.0.tar.gz

To install the downloaded package we use the following command:
```
R CMD INSTALL rhbase_1.2.0.tar.gz
```

Importing the data into R

Once RHBase is installed, we can load the dataset in R from HBase with the help of RHBase:

To list all tables we use:
```
hb.list.tables ()
```
To create a new table we use:
```
hb.new.table ("student")
```
To display the table structure we use:
```
hb.describe.table("student_rhbase")
```
To read data we use:
```
hb.get ('student_rhbase', 'mary')
```

Understanding data manipulation

Now, we will see how to operate over the dataset of HBase from within R:

To create the table we use:

hb.new.table ("student_rhbase", "info")

To insert the data we use:

hb.insert ("student_rhbase", list (list ("mary", "info: age", "24")))

To delete a sheet we use:
```
hb.delete.table ('student_rhbase')
```

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Understanding HBase

Create new playlist

Sign In

Sign Up