How to do it...

  1. Add the following Maven dependency for Apache Spark:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>
  1. Add the following Maven dependency for DataVec for Spark:
<dependency>
<groupId>org.datavec</groupId>
<artifactId>datavec-spark_2.11</artifactId>
<version>1.0.0-beta3_spark_2</version>
</dependency>
  1. Add the following Maven dependency for parameter averaging:
<dependency>
<groupId>org.datavec</groupId>
<artifactId>datavec-spark_2.11</artifactId>
<version>1.0.0-beta3_spark_2</version>
</dependency>
  1. Add the following Maven dependency for gradient sharing:
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>dl4j-spark-parameterserver_2.11</artifactId>
<version>1.0.0-beta3_spark_2</version>
</dependency>
  1. Add the following Maven dependency for the ND4J backend:
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native-platform</artifactId>
<version>1.0.0-beta3</version>
</dependency>

  1. Add the following Maven dependency for CUDA:
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-cuda-x.x</artifactId>
<version>1.0.0-beta3</version>
</dependency>
  1. Add the following Maven dependency for JCommander:
<dependency>
<groupId>com.beust</groupId>
<artifactId>jcommander</artifactId>
<version>1.72</version>
</dependency>
  1. Download Hadoop from the official website at https://hadoop.apache.org/releases.html and add the required environment variables.

Extract the downloaded Hadoop package and create the following environment variables:

HADOOP_HOME = {PathDownloaded}/hadoop-x.x 
HADOOP_HDFS_HOME = {PathDownloaded}/hadoop-x.x
HADOOP_MAPRED_HOME = {PathDownloaded}/hadoop-x.x
HADOOP_YARN_HOME = {PathDownloaded}/hadoop-x.x

Add the following entry to the PATH environment variable:

${HADOOP_HOME}in
  1. Create name/data node directories for Hadoop. Navigate to the Hadoop home directory (which is set in the HADOOP_HOME environment variable) and create a directory named data. Then, create two subdirectories named datanode and namenode underneath it. Make sure that access for read/write/delete has been provided for these directories. 
  2. Navigate to hadoop-x.x/etc/hadoop and open hdfs-site.xml. Then, add the following configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/{NameNodeDirectoryPath}</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/{DataNodeDirectoryPath}</value>
</property>
</configuration>
  1. Navigate to hadoop-x.x/etc/hadoop and open mapred-site.xml. Then, add the following configuration:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
  1. Navigate to hadoop-x.x/etc/hadoop and open yarn-site.xml. Then, add the following configuration:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
  1. Navigate to hadoop-x.x/etc/hadoop and open core-site.xml. Then, add the following configuration:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

  1. Navigate to hadoop-x.x/etc/hadoop and open hadoop-env.cmd. Then, replace set JAVA_HOME=%JAVA_HOME% with set JAVA_HOME={JavaHomeAbsolutePath}.

Add the winutils Hadoop fix (only applicable for Windows). You can download this from http://tiny.cc/hadoop-config-windowsAlternatively, you can navigate to the respective GitHub repositoryhttps://github.com/steveloughran/winutils, and get the fix that matches your installed Hadoop version. Replace the bin folder at ${HADOOP_HOME} with the bin folder in the fix.

  1. Run the following Hadoop command to format namenode:
hdfs namenode –format

You should see the following output:

  1. Navigate to ${HADOOP_HOME}sbin and start the Hadoop services:
    • For Windows, run start-all.cmd.
    • For Linux or any other OS, run start-all.sh from Terminal.

You should see the following output:

  1. Hit http://localhost:50070/ in your browser and verify whether Hadoop is up and running:

  1. Install Spark from https://spark.apache.org/downloads.html and add the required environment variables. Extract the package and add the following environment variables:
SPARK_HOME = {PathDownloaded}/spark-x.x-bin-hadoopx.x
SPARK_CONF_DIR = ${SPARK_HOME}conf
  1. Configure Spark's properties. Navigate to the directory location at SPARK_CONF_DIR and open the spark-env.sh file. Then, add the following configuration:
SPARK_MASTER_HOST=localhost
  1. Run the Spark master by running the following command:
spark-class org.apache.spark.deploy.master.Master

You should see the following output:

  1. Hit http://localhost:8080/ in your browser and verify whether Hadoop is up and running:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset