Introducing Hivemall

Hivemall is a scalable machine learning library built on top of Apache Hive and Hadoop. It is a collection of machine learning algorithms that are created as User Defined Functions (UDFs) and User Defined Table Functions (UDTFs). Hivemall offers the following benefits:

  • Easy to use: Existing users of Hive can implement machine learning algorithms using the well-known Hive QL language. There is no need to compile programs and create executable jars as in MLlib or H2O. Just add UDFs or UDTFs and execute Hive queries.
  • Scalability: It provides the scalability benefits of Hadoop and Hive with additional features to provide scalability to any number of training and testing instances and also any number of features.
  • It offers a variety of algorithms including Classification, Regression, K-Means, Recommendation, Anomaly Detection, and Feature engineering.

Follow this procedure to get started:

  1. Download the compatible JAR and functions from https://github.com/myui/hivemall/releases:
    [cloudera@quickstart ~]$ wget https://github.com/myui/hivemall/releases/download/v0.4.2-rc.2/hivemall-core-0.4.2-rc.2-with-dependencies.jar
    
  2. Start Hive and issue the following commands:
    hive>  add jar hivemall-core-0.4.2-rc.2-with-dependencies.jar;
    hive>  source define-all.hive;
    
  3. Execute examples from https://github.com/myui/hivemall/wiki.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset