Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Introducing Hivemall for Spark

Apache Hive supports three execution engines—MapReduce, Tez, and Spark. Though Hivemall does not support Spark natively, the Hivemall for Spark project (https://github.com/maropu/hivemall-spark) implements a wrapper for Spark. This wrapper enables you to use Hivemall UDFs in SparkContext, DataFrames, or Spark Streaming. It is really easy to get started with Hivemall for Spark. Follow this procedure to start a Scala shell, load UDFs, and execute SQLs:

Download the define-udfs script:

[cloudera@quickstart ~]$ wget https://raw.githubusercontent.com/maropu/hivemall-spark/master/scripts/ddl/define-udfs.sh --no-check-certificate

Start a Scala shell with the packages option:

[cloudera@quickstart ~]$ spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master local[*] --packages maropu:hivemall-spark:0.0.6

Create Hivemall functions as follows. Hivemall for Spark does not support Python yet:
```
scala> :load define-udfs.sh
```
Now you can execute examples from:
https://github.com/maropu/hivemall-spark/tree/master/tutorials

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Introducing Hivemall for Spark

Create new playlist

Sign In

Sign Up

Introducing Hivemall for Spark

Table of Contents for
Introducing Hivemall for Spark