Appendix : Alternative Methods to Launch H2O Clusters

This Appendix will show you how to launch H2O-3 and Sparkling Water clusters on your local machine so that you can run the code samples in this book. We will also show you how to launch H2O-3 clusters in the 90-day free trial environment for the H2O AI Cloud. This trial environment includes Enterprise Steam to launch and manage H2O clusters on Kubernetes infrastructure.

Note on Environments

Architecture: As introduced in Chapter 2, Platform Components and Key Concepts, you will use a client environment (with the H2O-3 or Sparkling Water libraries implemented) to run commands against a remote H2O-3 or Sparkling Water architecture distributed across multiple server nodes on a Kubernetes or Hadoop cluster. For small datasets, however, the architecture can be launched locally as a single process on the same machine as the client.

Versions: Functionality and code samples from this book use the following versions: H2O-3 version 3.34.0.7, and Sparkling Water version 3.34.0.7-1-3.2 to run on Spark 3.2. You will set up your environment with the latest (most recent) stable versions, which will allow you to run the same code samples from this book but will also include capabilities in H2O-3 and Sparkling Water that were added after the book was written.

Languages: You can set up your client environment in Python, R, or Java/Scala. We will use Python in this book. Your Python client can be a Jupyter notebook, PyCharm, or other.

Let's learn how to run H2O-3 entirely in your local environment.

Local H2O-3 cluster

This is the easiest method to run H2O-3 and is suitable for the small datasets used in code samples in this book. It launches H2O-3 on your local machine (versus an enterprise cluster environment) and does not involve H2O Enterprise Steam.

First, we will perform a one-time setup of our H2O-3 Python environment.

Step 1 – Install H2O-3 in Python

To set up your H2O-3 Python client, simply install three module dependencies in your Python environment and then the h2o-3 Python module. You must use Python 2.7.x, 3.5.x, 3.6.x, or 3.7.x.

More specifically, do the following:

  1. Install dependencies in your Python environment:

    pip install requests

    pip install tabulate

    pip install future

  2. Install the H2O-3 library in your Python environment:

    pip install h2o

Please refer to http://h2o-release.s3.amazonaws.com/h2o/rel-zumbo/1/index.html (the INSTALL IN PYTHON tab) to install H2O-3 in Conda.

You are now ready to run H2O-3 locally. Let's see how to do that.

Step 2 – Launch your H2O-3 cluster and write code

To start a local single-node H2O-3 cluster, simply run the following in your Python IDE:

import h2o
h2o.init()
# write h2o-3 code, including code samples in this book

You can now write your H2O-3 code, including all samples from this book. See Chapter 2, Platform Components and Key Concepts, for a Hello World code sample and an explanation of what happens under the surface.

Java Dependency – Only When Running Locally

The H2O-3 cluster (not the Python client) runs on Java. Because you are running the cluster on your local machine here (representing a single-node cluster), you must have Java installed. This is not required when you use your Python client to connect to a remote H2O cluster in your enterprise Kubernetes or Hadoop environment.

Now, let's see how we can set up our environment to write Sparkling Water code on our local machine.

Local Sparkling Water cluster

Running Sparkling Water locally is similar to running H2O-3 locally, but with Spark dependencies. See this link for a full explanation of the Spark, Python, and H2O components involved: https://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/pysparkling.html.

We will be using Spark 3.2 here. To use a different version of Spark, go to the Sparkling Water section of the H2O downloads page at the following link: https://h2o.ai/resources/download/.

For your Sparkling Water Python client, you must use Python 2.7.x, 3.5.x, 3.6.x, or 3.7.x. We will be running Sparkling Water from a Jupyter notebook here.

Step 1 – Install Spark locally

Follow these steps to install Spark locally:

  1. Go to https://spark.apache.org/downloads.html to download Spark. Make the following choices and then download:
    • Spark version: 3.2.x
    • Package type: Pre-built for Hadoop 3.3 and later
  2. Unzip the downloaded file.
  3. Set the following environment variables (shown here for macOS):

    export SPARK_HOME="/path/to/spark/folder"

    export MASTER="local[*]"

Now, let's install the Sparkling Water library in our Python environment.

Step 2 – Install Sparkling Water in Python

Install the following modules:

  1. Install dependencies in your Python environment:

    pip install requests

    pip install tabulate

    pip install future

  2. Install the Sparkling Water Python module (called PySparkling). Note the module reference to Spark 3.2 specifically here:

    pip install h2o_pysparkling_3.2

Next, let's install an interactive shell.

Step 3 – Install a Sparkling Water Python interactive shell

To run Sparkling Water locally, we need to install an interactive shell to launch the Sparkling Water cluster on Spark. (This is only required when running Sparkling Water locally; Enterprise Steam takes care of this when running on your enterprise cluster.) To do so, perform the following steps:

  1. Download the interactive shell by navigating to the Sparkling Water section of https://h2o.ai/resources/download/, clicking Sparkling Water For Spark 3.2, and finally, clicking on the DOWNLOAD SPARKLING WATER button.
  2. Unzip the download.

Now, let's launch a Sparkling Water cluster and access it from a Jupyter notebook.

Step 4 – Launch a Jupyter notebook on top of the Sparkling Water shell

We assume you have Jupyter Notebook installed in the same Python environment as your installations in step 2. Perform the following steps to launch a Jupyter notebook:

  1. On the command line, navigate into the directory where you unzipped the download in step 3 of this section.
  2. Launch the Sparkling Water interactive shell and a Jupyter notebook in it:
    • For macOS, use the following:

      PYSPARK_DRIVER_PYTHON="ipython"

      PYSPARK_DRIVER_PYTHON_OPTS="notebook"

      bin/pysparkling

    • For Windows, use the following:

      SET PYSPARK_DRIVER_PYTHON=ipython

      SET PYSPARK_DRIVER_PYTHON_OPTS=notebook

      bin/pysparkling

Your Jupyter notebook should launch in your browser.

Now, let's write Sparkling Water code.

Step 5 – Launch your Sparkling Water cluster and write code

In your Jupyter notebook, type the following code to get you started:

  1. Start your Sparkling Water cluster:

    from pysparkling import *

    import h2o

    hc = H2OContext.getOrCreate()

    hc

  2. Test the installation:

    localdata = "/path/to/my/csv"

    mysparkdata = spark.read.load(localdata, format="csv")

    myH2Odata = hc.asH2OFrame(mysparkdata)

You are now ready to build models using both H2O and Spark code.

H2O-3 cluster in the 90-day free trial environment for H2O AI Cloud

Here, you must interact with Enterprise Steam to run H2O-3. In this case, you will install the h2osteam module in your Python client environment in addition to the h2o module as we did when running H2O-3 locally.

Step 1 – Get your 90-day trial to H2O AI Cloud

Get your trial access to H2O AI Cloud here: https://h2o.ai/freetrial.

When you have completed all steps and can log in to H2O AI Cloud, then we can start running H2O-3 clusters as part of the H2O AI Cloud platform. Here are the next steps.

Step 2 – Set up your Python environment

To set up your Python client environment, perform the following steps:

  1. Log in to H2O AI Cloud and click on the My AI Engines tab. This will take you to Enterprise Steam, as shown in the following screenshot. From there, download the h2osteam library by clicking on the Python Client option from the sidebar:
Figure 15.1 – Enterprise Steam

Figure 15.1 – Enterprise Steam

  1. Install the h2osteam library in your Python environment by running the following command:

    pip install /path/to/download.whl

Here, /path/to/download.whl is replaced by your actual path.

  1. You will also need to install the h2o library. To do so, execute the following:

    pip install requests

    pip install tabulate

    pip install future

    pip install h2o

Now, let's use Steam to start an H2O cluster and then write H2O code in Python.

Step 3 – Launch your cluster

Follow these steps to launch your H2O cluster, which is done on a Kubernetes server cluster:

  1. In Enterprise Steam, click H2O on the sidebar and then click the Launch New Cluster button.
  2. You now can configure your H2O cluster and give it a name. Be sure to configure the latest H2O version from the dropdown, which should match the library you installed in the previous step.
  3. When configured, click the Launch Cluster button and wait for the cluster launch to complete.
  4. You will need the URL to Enterprise Steam to connect to it from your Jupyter notebook or other Python client. While in Steam, copy the URL from https to h2o.ai, inclusive.

Step 4 – Write H2O-3 code

We can now start writing code (for example in Jupyter) to build models on our H2O-3 cluster that we just launched. Perform the following steps after opening your Python client:

  1. Import your libraries and connect to Enterprise Steam:

    import h2o

    import h2osteam

    from h2osteam.clients import H2oKubernetesClient

    conn = h2osteam.login(

        url="https://SteamURL,

    verify_ssl=False,

        username="yourH2OAICloudUserName",

        password=" yourH2OAICloudPassword")

    Important Note

    At the time of this writing the URL for the 90-day H2O AI Cloud trial is https://steam.cloud.h2o.ai.

    For password you can use your login password to the H2O AI Cloud trial environment, or you can use a temporary personal access token generated from the Enterprise Steam Configurations page.

  2. Connect to your H2O cluster you started in Enterprise Steam:

    cluster = H2oKubernetesClient().get_cluster(

        name="yourClusterName",

        created_by="yourH2OAICloudUserName")

    cluster.connect()

    # you are now ready to write code to run on this H2O cluster

You can now write your H2O-3 code, including all samples from this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset