Configuring programming environment

In this section, we describe how to configure our programming environment so that we can interoperate with Spark, H2O, and Adam. Note that using H2O on a laptop or desktop is quite resource intensive. Therefore, make sure that your laptop has at least 16 GB of RAM and enough storage.

Anyway, I am going to make this project a Maven project on Eclipse. However, you can try to define the same dependencies in SBT too. Let us define the properties tag on a pom.xml file for a Maven-friendly project:

<properties>
<spark.version>2.2.1</spark.version>
<scala.version>2.11.12</scala.version>
<h2o.version>3.16.0.2</h2o.version>
<sparklingwater.version>2.2.6</sparklingwater.version>
<adam.version>0.23.0</adam.version>
</properties>

Then we can the latest version of the Spark 2.2.1 version (any 2.x version or even higher should work fine):

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>

Then we need to declare the dependencies for H2O and Sparkling water that match the version specified in the properties tag. Later versions might also work, and you can try:

<dependency>
<groupId>ai.h2o</groupId>
<artifactId>sparkling-water-core_2.11</artifactId>
<version>2.2.6</version>
</dependency>
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>sparkling-water-examples_2.11</artifactId>
<version>2.2.6</version>
</dependency>
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-core</artifactId>
<version>${h2o.version}</version>
</dependency>
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-scala_2.11</artifactId>
<version>${h2o.version}</version>
</dependency>
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-algos</artifactId>
<version>${h2o.version}</version>
</dependency>
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-app</artifactId>
<version>${h2o.version}</version>
</dependency>
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-persist-hdfs</artifactId>
<version>${h2o.version}</version>
</dependency>
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>google-analytics-java</artifactId>
<version>1.1.2-H2O-CUSTOM</version>
</dependency>

Finally, let's define ADAM and its dependencies:

<dependency>
<groupId>org.bdgenomics.adam</groupId>
<artifactId>adam-core_2.11</artifactId>
<version>0.23.0</version>
</dependency>

When I tried this on a Windows machine, additionally I had to install joda-time dependencies. Let us do it (but depending your platform, it might not be needed):

<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.9.9</version>
</dependency>

Once you create a Maven project in Eclipse (manually from the IDE or using $ mvn install), all the required dependencies will be downloaded! We are ready to code now!

Wait! How about seeing the UI of H2O on the browser? For this, we have to manually download the H2O JAR somewhere in our computer and run it as a regular .jar file. In short, it's a three-way process:

  • Download the Latest Stable Release H2O from https://www.h2o.ai/download/. Then unzip it; it contains everything you need to get started.
  • From your terminal/command prompt, run the .jar using java -jar h2o.jar.
  • Point your browser to http://localhost:54321:
Figure 11: The UI of H2O FLOW

This shows the available features of the latest version (that is, h2o-3.16.0.4 as of 19 January 2018) of H2O. However, I am not going to explain everything here, so let's stop exploring because I believe for the time being this much knowledge about H2O and Sparking water will be enough.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset