RHIVE – install R on workstations and connect to data in Hadoop

If you want your Hive queries to be launched from the R interface, then RHIVE is the go-to package with functions for retrieving metadata such as database names, column names, and table names from Apache Hive. RHIVE provides rich statistical libraries and algorithms available in the R programming language for the data stored in Hadoop by extending HiveQL with R language functions. RHIVE functions allow users to apply R statistical learning models to the data stored in Hadoop cluster that has been cataloged using Apache Hive. The advantage of using RHIVE for Hadoop R integration is that it parallelizes operations and avoids data movement because data operations are pushed down into Hadoop.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset