Introducing Hivemall for Spark

Apache Hive supports three execution engines—MapReduce, Tez, and Spark. Though Hivemall does not support Spark natively, the Hivemall for Spark project (https://github.com/maropu/hivemall-spark) implements a wrapper for Spark. This wrapper enables you to use Hivemall UDFs in SparkContext, DataFrames, or Spark Streaming. It is really easy to get started with Hivemall for Spark. Follow this procedure to start a Scala shell, load UDFs, and execute SQLs:

  1. Download the define-udfs script:
    [cloudera@quickstart ~]$ wget https://raw.githubusercontent.com/maropu/hivemall-spark/master/scripts/ddl/define-udfs.sh --no-check-certificate
    
  2. Start a Scala shell with the packages option:
    [cloudera@quickstart ~]$ spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master local[*] --packages maropu:hivemall-spark:0.0.6
    
  3. Create Hivemall functions as follows. Hivemall for Spark does not support Python yet:
    scala> :load define-udfs.sh
    
  4. Now you can execute examples from:

    https://github.com/maropu/hivemall-spark/tree/master/tutorials

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset