Understanding Hive

Hive is a Hadoop-based data warehousing-like framework developed by Facebook. It allows users to fire queries in SQL, with languages like HiveQL, which are highly abstracted to Hadoop MapReduce. This allows SQL programmers with no MapReduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools for real-time query processing.

Understanding features of Hive

The following are the features of Hive:

  • Hibernate Query Language (HQL)
  • Supports UDF
  • Metadata storage
  • Data indexing
  • Different storage type
  • Hadoop integration

Prerequisites for RHive are as follows:

  • Hadoop
  • Hive

We assume here that our readers have already configured Hadoop; else they can learn Hadoop installation from Chapter 1, Getting Ready to Use R and Hadoop. As Hive will be required for running RHive, we will first see how Hive can be installed.

Installing Hive

The commands to install Hive are as follows:

// Downloading the hive source from apache mirror
wget http://www.motorlogy.com/apache/hive/hive-0.11.0/hive-0.11.0.tar.gz

// For extracting the hive source
tar xzvf  hive-0.11.0.tar.gz

Setting up Hive configurations

To setup Hive configuration, we need to update the hive-site.xml file with a few additions:

  • Update hive-site.xml using the following commands:
    <description> JDBC connect string for a JDBC metastore </ description>
    </Property>
    
    <property>
    <name> javax.jdo.option.ConnectionDriverName </ name>
    <value> com.mysql.jdbc.Driver </ value>
    <description> Driver class name for a JDBC metastore </ description>
    </Property>
    
    <property>
    <name> javax.jdo.option.ConnectionUserName </ name>
    <value> hive </value>
    <description> username to use against metastore database </ description>
    </ Property>
    
    <property>
    <name> javax.jdo.option.ConnectionPassword </name>
    <value> hive</value>
    <description> password to use against metastore database </ description>
    </Property>
    
    <property>
    <name> hive.metastore.warehouse.dir </ name>
    <value> /user/hive/warehouse </value>
    <description> location of default database for the warehouse </ description>
    </Property>
  • Update hive-log4j.properties by adding the following line:
    log4j.appender.EventCounter = org.apache.hadoop.log.metrics.EventCounter
    
  • Update the environment variables by using the following command:
    export $HIVE_HOME=/usr/local/ hive-0.11.0
    
  • In HDFS, create specific directories for Hive:
    $HADOOP_HOME/bin/ hadoop fs-mkidr /tmp
    $HADOOP_HOME/bin/ hadoop fs-mkidr /user/hive/warehouse
    $HADOOP_HOME/bin/ hadoop fs-chmod g+w / tmp
    $HADOOP_HOME/bin/ hadoop fs-chmod g+w /user/hive/warehouse
    

    Tip

    To start the hive server, the hive --service hiveserver command needs to be called from HIVE_HOME.

Installing RHive

  • Install the dependant library, rjava, using the following commands:
    // for setting up java configuration variables
    sudo R CMD javareconf
    
    // Installing rJava package
    install.packages ("rJava")
    
    // Installing RHive package from CRAN
    install.packages("RHive")
    
    // Loading RHive library
    library("RHive")
    

Understanding RHive operations

We will see how we can load and operate over Hive datasets in R using the RHive library:

  • To initialize RHive we use:
    rhive.init ()
    
  • To connect with the Hive server we use:
    rhive.connect ("192.168.1.210")
    
  • To view all tables we use:
    rhive.list.tables ()
                 tab_name
    1 hive_algo_t_account
    2 o_account
    3 r_t_account
    
  • To view the table structure we use:
    rhive.desc.table ('o_account'),
         col_name data_type comment
    
    1 id int
    2 email string
    3 create_date string
    
  • To execute the HQL queries we use:
    rhive.query ("select * from o_account");
    
  • To close connection to the Hive server we use:
    rhive.close()
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset