How to do it...

  1. Download, extract, and copy the contents of the TinyImageNet dataset to the following directory location:
* Windows: C:Users<username>.deeplearning4jdataTINYIMAGENET_200
* Linux: ~/.deeplearning4j/data/TINYIMAGENET_200
  1. Create batches of images for training using the TinyImageNet dataset:
File saveDirTrain = new File(batchSavedLocation, "train");
SparkDataUtils.createFileBatchesLocal(dirPathDataSet, NativeImageLoader.ALLOWED_FORMATS, true, saveDirTrain, batchSize);
  1. Create batches of images for testing using the TinyImageNet dataset:
File saveDirTest = new File(batchSavedLocation, "test");
SparkDataUtils.createFileBatchesLocal(dirPathDataSet, NativeImageLoader.ALLOWED_FORMATS, true, saveDirTest, batchSize);
  1. Create an ImageRecordReader that holds a reference of the dataset:
PathLabelGenerator labelMaker = new ParentPathLabelGenerator();
ImageRecordReader rr = new ImageRecordReader(imageHeightWidth, imageHeightWidth, imageChannels, labelMaker);
rr.setLabels(new TinyImageNetDataSetIterator(1).getLabels());
  1. Create RecordReaderFileBatchLoader from ImageRecordReader to load the batch data:
RecordReaderFileBatchLoader loader = new RecordReaderFileBatchLoader(rr, batchSize, 1, TinyImageNetFetcher.NUM_LABELS);
loader.setPreProcessor(new ImagePreProcessingScaler());
  1. Use JCommander at the beginning of your source code to parse command-line arguments:
JCommander jcmdr = new JCommander(this);
jcmdr.parse(args);
  1. Create a parameter server configuration (gradient sharing) for Spark training using VoidConfiguration, as shown in the following code:
VoidConfiguration voidConfiguration = VoidConfiguration.builder()
.unicastPort(portNumber)
.networkMask(netWorkMask)
.controllerAddress(masterNodeIPAddress)
.build();
  1. Configure a distributed training network using SharedTrainingMaster, as shown in the following code:
TrainingMaster tm = new SharedTrainingMaster.Builder(voidConfiguration, batchSize)
.rngSeed(12345)
.collectTrainingStats(false)
.batchSizePerWorker(batchSize) // Minibatch size for each worker
.thresholdAlgorithm(new AdaptiveThresholdAlgorithm(1E-3)) //Threshold algorithm determines the encoding threshold to be use.
.workersPerNode(1) // Workers per node
.build();

  1. Create a GraphBuilder for ComputationGraphConfguration, as shown in the following code:
ComputationGraphConfiguration.GraphBuilder builder = new NeuralNetConfiguration.Builder()
.convolutionMode(ConvolutionMode.Same)
.l2(1e-4)
.updater(new AMSGrad(lrSchedule))
.weightInit(WeightInit.RELU)
.graphBuilder()
.addInputs("input")
.setOutputs("output");
  1. Use DarknetHelper from the DL4J Model Zoo to power up our CNN architecture, as shown in the following code:
DarknetHelper.addLayers(builder, 0, 3, 3, 32, 0); //64x64 out
DarknetHelper.addLayers(builder, 1, 3, 32, 64, 2); //32x32 out
DarknetHelper.addLayers(builder, 2, 2, 64, 128, 0); //32x32 out
DarknetHelper.addLayers(builder, 3, 2, 128, 256, 2); //16x16 out
DarknetHelper.addLayers(builder, 4, 2, 256, 256, 0); //16x16 out
DarknetHelper.addLayers(builder, 5, 2, 256, 512, 2); //8x8 out
  1. Configure the output layers while considering the number of labels and loss functions, as shown in the following code:
builder.addLayer("convolution2d_6", new ConvolutionLayer.Builder(1, 1)
.nIn(512)
.nOut(TinyImageNetFetcher.NUM_LABELS) // number of labels (classified outputs) = 200
.weightInit(WeightInit.XAVIER)
.stride(1, 1)
.activation(Activation.IDENTITY)
.build(), "maxpooling2d_5")
.addLayer("globalpooling", new GlobalPoolingLayer.Builder(PoolingType.AVG).build(), "convolution2d_6")
.addLayer("loss", new LossLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD).activation(Activation.SOFTMAX).build(), "globalpooling")
.setOutputs("loss");
  1. Create ComputationGraphConfguration from the GraphBuilder:
ComputationGraphConfiguration configuration = builder.build(); 
  1. Create the SparkComputationGraph model from the defined configuration and set training listeners to it:
SparkComputationGraph sparkNet = new SparkComputationGraph(context,configuration,tm);
sparkNet.setListeners(new PerformanceListener(10, true));
  1. Create JavaRDD objects that represent the HDFS paths of the batch files that we created earlier for training:
String trainPath = dataPath + (dataPath.endsWith("/") ? "" : "/") + "train";
JavaRDD<String> pathsTrain = SparkUtils.listPaths(context, trainPath);
  1. Invoke the training instance by calling fitPaths():
for (int i = 0; i < numEpochs; i++) {
sparkNet.fitPaths(pathsTrain, loader);
}
  1. Create JavaRDD objects that represent the HDFS paths to batch files that we created earlier for testing:
String testPath = dataPath + (dataPath.endsWith("/") ? "" : "/") + "test";
JavaRDD<String> pathsTest = SparkUtils.listPaths(context, testPath);
  1. Evaluate the distributed neural network by calling doEvaluation():
Evaluation evaluation = new Evaluation(TinyImageNetDataSetIterator.getLabels(false), 5);
evaluation = (Evaluation) sparkNet.doEvaluation(pathsTest, loader, evaluation)[0];
log.info("Evaluation statistics: {}", evaluation.stats());
  1. Run the distributed training instance on spark-submit in the following format:
spark-submit --master spark://{sparkHostIp}:{sparkHostPort} --class {clssName} {JAR File location absolute path} --dataPath {hdfsPathToPreprocessedData} --masterIP {masterIP}

Example:
spark-submit --master spark://192.168.99.1:7077 --class com.javacookbook.app.SparkExample cookbookapp-1.0-SNAPSHOT.jar --dataPath hdfs://localhost:9000/user/hadoop/batches/imagenet-preprocessed --masterIP 192.168.99.1

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset