How to do it...

Download, extract, and copy the contents of the TinyImageNet dataset to the following directory location:

* Windows: C:Users<username>.deeplearning4jdataTINYIMAGENET_200
 * Linux: ~/.deeplearning4j/data/TINYIMAGENET_200

Create batches of images for training using the TinyImageNet dataset:

File saveDirTrain = new File(batchSavedLocation, "train");
 SparkDataUtils.createFileBatchesLocal(dirPathDataSet, NativeImageLoader.ALLOWED_FORMATS, true, saveDirTrain, batchSize);

Create batches of images for testing using the TinyImageNet dataset:

File saveDirTest = new File(batchSavedLocation, "test");
 SparkDataUtils.createFileBatchesLocal(dirPathDataSet, NativeImageLoader.ALLOWED_FORMATS, true, saveDirTest, batchSize);

Create an ImageRecordReader that holds a reference of the dataset:

PathLabelGenerator labelMaker = new ParentPathLabelGenerator();
 ImageRecordReader rr = new ImageRecordReader(imageHeightWidth, imageHeightWidth, imageChannels, labelMaker);
 rr.setLabels(new TinyImageNetDataSetIterator(1).getLabels());

Create RecordReaderFileBatchLoader from ImageRecordReader to load the batch data:

RecordReaderFileBatchLoader loader = new RecordReaderFileBatchLoader(rr, batchSize, 1, TinyImageNetFetcher.NUM_LABELS);
 loader.setPreProcessor(new ImagePreProcessingScaler());

Use JCommander at the beginning of your source code to parse command-line arguments:

JCommander jcmdr = new JCommander(this);
 jcmdr.parse(args);

Create a parameter server configuration (gradient sharing) for Spark training using VoidConfiguration, as shown in the following code:

VoidConfiguration voidConfiguration = VoidConfiguration.builder()
 .unicastPort(portNumber)
 .networkMask(netWorkMask)
 .controllerAddress(masterNodeIPAddress)
 .build();

Configure a distributed training network using SharedTrainingMaster, as shown in the following code:

TrainingMaster tm = new SharedTrainingMaster.Builder(voidConfiguration, batchSize)
 .rngSeed(12345)
 .collectTrainingStats(false)
 .batchSizePerWorker(batchSize) // Minibatch size for each worker
 .thresholdAlgorithm(new AdaptiveThresholdAlgorithm(1E-3)) //Threshold algorithm determines the encoding threshold to be use.
 .workersPerNode(1) // Workers per node
 .build();

Create a GraphBuilder for ComputationGraphConfguration, as shown in the following code:

ComputationGraphConfiguration.GraphBuilder builder = new NeuralNetConfiguration.Builder()
 .convolutionMode(ConvolutionMode.Same)
 .l2(1e-4)
 .updater(new AMSGrad(lrSchedule))
 .weightInit(WeightInit.RELU)
 .graphBuilder()
 .addInputs("input")
 .setOutputs("output");

Use DarknetHelper from the DL4J Model Zoo to power up our CNN architecture, as shown in the following code:

DarknetHelper.addLayers(builder, 0, 3, 3, 32, 0); //64x64 out
 DarknetHelper.addLayers(builder, 1, 3, 32, 64, 2); //32x32 out
 DarknetHelper.addLayers(builder, 2, 2, 64, 128, 0); //32x32 out
 DarknetHelper.addLayers(builder, 3, 2, 128, 256, 2); //16x16 out
 DarknetHelper.addLayers(builder, 4, 2, 256, 256, 0); //16x16 out
 DarknetHelper.addLayers(builder, 5, 2, 256, 512, 2); //8x8 out

Configure the output layers while considering the number of labels and loss functions, as shown in the following code:

builder.addLayer("convolution2d_6", new ConvolutionLayer.Builder(1, 1)
 .nIn(512)
 .nOut(TinyImageNetFetcher.NUM_LABELS) // number of labels (classified outputs) = 200
 .weightInit(WeightInit.XAVIER)
 .stride(1, 1)
 .activation(Activation.IDENTITY)
 .build(), "maxpooling2d_5")
 .addLayer("globalpooling", new GlobalPoolingLayer.Builder(PoolingType.AVG).build(), "convolution2d_6")
 .addLayer("loss", new LossLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD).activation(Activation.SOFTMAX).build(), "globalpooling")
 .setOutputs("loss");

Create ComputationGraphConfguration from the GraphBuilder:

ComputationGraphConfiguration configuration = builder.build();

Create the SparkComputationGraph model from the defined configuration and set training listeners to it:

SparkComputationGraph sparkNet = new SparkComputationGraph(context,configuration,tm);
 sparkNet.setListeners(new PerformanceListener(10, true));

Create JavaRDD objects that represent the HDFS paths of the batch files that we created earlier for training:

String trainPath = dataPath + (dataPath.endsWith("/") ? "" : "/") + "train";
 JavaRDD<String> pathsTrain = SparkUtils.listPaths(context, trainPath);

Invoke the training instance by calling fitPaths():

for (int i = 0; i < numEpochs; i++) {
   sparkNet.fitPaths(pathsTrain, loader);
 }

Create JavaRDD objects that represent the HDFS paths to batch files that we created earlier for testing:

String testPath = dataPath + (dataPath.endsWith("/") ? "" : "/") + "test";
 JavaRDD<String> pathsTest = SparkUtils.listPaths(context, testPath);

Evaluate the distributed neural network by calling doEvaluation():

Evaluation evaluation = new Evaluation(TinyImageNetDataSetIterator.getLabels(false), 5);
 evaluation = (Evaluation) sparkNet.doEvaluation(pathsTest, loader, evaluation)[0];
 log.info("Evaluation statistics: {}", evaluation.stats());

Run the distributed training instance on spark-submit in the following format:

spark-submit --master spark://{sparkHostIp}:{sparkHostPort} --class {clssName} {JAR File location absolute path} --dataPath {hdfsPathToPreprocessedData} --masterIP {masterIP}
 
Example:
 spark-submit --master spark://192.168.99.1:7077 --class com.javacookbook.app.SparkExample cookbookapp-1.0-SNAPSHOT.jar --dataPath hdfs://localhost:9000/user/hadoop/batches/imagenet-preprocessed --masterIP 192.168.99.1

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...