Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Understanding Hive

Hive is a Hadoop-based data warehousing-like framework developed by Facebook. It allows users to fire queries in SQL, with languages like HiveQL, which are highly abstracted to Hadoop MapReduce. This allows SQL programmers with no MapReduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools for real-time query processing.

Understanding features of Hive

The following are the features of Hive:

Hibernate Query Language (HQL)
Supports UDF
Metadata storage
Data indexing
Different storage type
Hadoop integration

Prerequisites for RHive are as follows:

Hadoop
Hive

We assume here that our readers have already configured Hadoop; else they can learn Hadoop installation from Chapter 1, Getting Ready to Use R and Hadoop. As Hive will be required for running RHive, we will first see how Hive can be installed.

Installing Hive

The commands to install Hive are as follows:

// Downloading the hive source from apache mirror
wget http://www.motorlogy.com/apache/hive/hive-0.11.0/hive-0.11.0.tar.gz

// For extracting the hive source
tar xzvf  hive-0.11.0.tar.gz

Setting up Hive configurations

To setup Hive configuration, we need to update the hive-site.xml file with a few additions:

Update hive-site.xml using the following commands:

<description> JDBC connect string for a JDBC metastore </ description>
</Property>

<property>
<name> javax.jdo.option.ConnectionDriverName </ name>
<value> com.mysql.jdbc.Driver </ value>
<description> Driver class name for a JDBC metastore </ description>
</Property>

<property>
<name> javax.jdo.option.ConnectionUserName </ name>
<value> hive </value>
<description> username to use against metastore database </ description>
</ Property>

<property>
<name> javax.jdo.option.ConnectionPassword </name>
<value> hive</value>
<description> password to use against metastore database </ description>
</Property>

<property>
<name> hive.metastore.warehouse.dir </ name>
<value> /user/hive/warehouse </value>
<description> location of default database for the warehouse </ description>
</Property>

Update hive-log4j.properties by adding the following line:

log4j.appender.EventCounter = org.apache.hadoop.log.metrics.EventCounter

Update the environment variables by using the following command:
```
export $HIVE_HOME=/usr/local/ hive-0.11.0
```

In HDFS, create specific directories for Hive:

$HADOOP_HOME/bin/ hadoop fs-mkidr /tmp
$HADOOP_HOME/bin/ hadoop fs-mkidr /user/hive/warehouse
$HADOOP_HOME/bin/ hadoop fs-chmod g+w / tmp
$HADOOP_HOME/bin/ hadoop fs-chmod g+w /user/hive/warehouse

Tip

To start the hive server, the hive --service hiveserver command needs to be called from HIVE_HOME.

Installing RHive

Install the dependant library, rjava, using the following commands:

// for setting up java configuration variables
sudo R CMD javareconf

// Installing rJava package
install.packages ("rJava")

// Installing RHive package from CRAN
install.packages("RHive")

// Loading RHive library
library("RHive")

Understanding RHive operations

We will see how we can load and operate over Hive datasets in R using the RHive library:

To initialize RHive we use:
```
rhive.init ()
```
To connect with the Hive server we use:
```
rhive.connect ("192.168.1.210")
```

To view all tables we use:

rhive.list.tables ()
             tab_name
1 hive_algo_t_account
2 o_account
3 r_t_account

To view the table structure we use:

rhive.desc.table ('o_account'),
     col_name data_type comment

1 id int
2 email string
3 create_date string

To execute the HQL queries we use:

rhive.query ("select * from o_account");

To close connection to the Hive server we use:
```
rhive.close()
```

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Understanding Hive

Create new playlist

Sign In

Sign Up