How to do it...

Add the following Maven dependency for Apache Spark:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.1.0</version>
</dependency>

Add the following Maven dependency for DataVec for Spark:

<dependency>
    <groupId>org.datavec</groupId>
    <artifactId>datavec-spark_2.11</artifactId>
    <version>1.0.0-beta3_spark_2</version>
</dependency>

Add the following Maven dependency for parameter averaging:

<dependency>
    <groupId>org.datavec</groupId>
    <artifactId>datavec-spark_2.11</artifactId>
    <version>1.0.0-beta3_spark_2</version>
</dependency>

Add the following Maven dependency for gradient sharing:

<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>dl4j-spark-parameterserver_2.11</artifactId>
    <version>1.0.0-beta3_spark_2</version>
</dependency>

Add the following Maven dependency for the ND4J backend:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native-platform</artifactId>
    <version>1.0.0-beta3</version>
</dependency>

Add the following Maven dependency for CUDA:

<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-x.x</artifactId>
    <version>1.0.0-beta3</version>
</dependency>

Add the following Maven dependency for JCommander:

<dependency>
    <groupId>com.beust</groupId>
    <artifactId>jcommander</artifactId>
    <version>1.72</version>
</dependency>

Download Hadoop from the official website at https://hadoop.apache.org/releases.html and add the required environment variables.

Extract the downloaded Hadoop package and create the following environment variables:

HADOOP_HOME = {PathDownloaded}/hadoop-x.x 
 HADOOP_HDFS_HOME = {PathDownloaded}/hadoop-x.x 
 HADOOP_MAPRED_HOME = {PathDownloaded}/hadoop-x.x 
 HADOOP_YARN_HOME = {PathDownloaded}/hadoop-x.x

Add the following entry to the PATH environment variable:

${HADOOP_HOME}in

Create name/data node directories for Hadoop. Navigate to the Hadoop home directory (which is set in the HADOOP_HOME environment variable) and create a directory named data. Then, create two subdirectories named datanode and namenode underneath it. Make sure that access for read/write/delete has been provided for these directories.
Navigate to hadoop-x.x/etc/hadoop and open hdfs-site.xml. Then, add the following configuration:

<configuration>
     <property>
      <name>dfs.replication</name>
      <value>1</value>
     </property>
     <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/{NameNodeDirectoryPath}</value>
     </property>
     <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/{DataNodeDirectoryPath}</value>
     </property>
   </configuration>

Navigate to hadoop-x.x/etc/hadoop and open mapred-site.xml. Then, add the following configuration:

<configuration>
  <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
  </property>
 </configuration>

Navigate to hadoop-x.x/etc/hadoop and open yarn-site.xml. Then, add the following configuration:

<configuration>
  <!-- Site specific YARN configuration properties -->
  <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
  </property>
  <property>
   <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
 </configuration>

Navigate to hadoop-x.x/etc/hadoop and open core-site.xml. Then, add the following configuration:

<configuration>
  <property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
  </property>
 </configuration>

Navigate to hadoop-x.x/etc/hadoop and open hadoop-env.cmd. Then, replace set JAVA_HOME=%JAVA_HOME% with set JAVA_HOME={JavaHomeAbsolutePath}.

Add the winutils Hadoop fix (only applicable for Windows). You can download this from http://tiny.cc/hadoop-config-windows. Alternatively, you can navigate to the respective GitHub repository, https://github.com/steveloughran/winutils, and get the fix that matches your installed Hadoop version. Replace the bin folder at ${HADOOP_HOME} with the bin folder in the fix.

Run the following Hadoop command to format namenode:

hdfs namenode –format

You should see the following output:

Navigate to ${HADOOP_HOME}sbin and start the Hadoop services:
- For Windows, run start-all.cmd.
- For Linux or any other OS, run start-all.sh from Terminal.

You should see the following output:

Hit http://localhost:50070/ in your browser and verify whether Hadoop is up and running:

Install Spark from https://spark.apache.org/downloads.html and add the required environment variables. Extract the package and add the following environment variables:

SPARK_HOME = {PathDownloaded}/spark-x.x-bin-hadoopx.x
SPARK_CONF_DIR = ${SPARK_HOME}conf

Configure Spark's properties. Navigate to the directory location at SPARK_CONF_DIR and open the spark-env.sh file. Then, add the following configuration:

SPARK_MASTER_HOST=localhost

Run the Spark master by running the following command:

spark-class org.apache.spark.deploy.master.Master

You should see the following output:

Hit http://localhost:8080/ in your browser and verify whether Hadoop is up and running:

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...