How it works....

Step 1 can be automated using TinyImageNetFetcher, as shown here:

TinyImageNetFetcher fetcher = new TinyImageNetFetcher();
fetcher.downloadAndExtract();

For any OS, the data needs to be copied to the user's home directory. Once it is executed, we can get a reference to the train/test dataset directory, as shown here:

File baseDirTrain = DL4JResources.getDirectory(ResourceType.DATASET, f.localCacheName() + "/train");
File baseDirTest = DL4JResources.getDirectory(ResourceType.DATASET, f.localCacheName() + "/test");

You can also mention your own input directory location from your local disk or HDFS. You will need to mention that in place of dirPathDataSet in step 2.

In step 2 and step 3, we created batches of images so that we could optimize the distributed training. We used createFileBatchesLocal() to create these batches, where the source of the data is a local disk. If you want to create batches from the HDFS source, then use createFileBatchesSpark() instead. These compressed batch files will save space and reduce bottlenecks in computation. Suppose we loaded 64 images in a compressed batch  we don't require 64 different disk reads to process the batch file. These batches contain the contents of raw files from multiple files.

In step 5, we used RecordReaderFileBatchLoader to process file batch objects that were created using either createFileBatchesLocal() or createFileBatchesSpark(). As we mentioned in step 6, you can use JCommander to process the command-line arguments from spark-submit or write your own logic to handle them.

In step 7, we configured the parameter server using the VoidConfiguration class. This is a basic configuration POJO class for the parameter server. We can mention the port number, network mask, and so on for the parameter server. The network mask is a very important configuration in a shared network environment and YARN. 

In step 8, we started configuring the distributed network for training using SharedTrainingMaster. We added important configurations such as threshold algorithms, worker node count, minibatch size, and so on.

Starting from steps 9 and 10, we focused on distributed neural network layer configuration. We used DarknetHelper from the DL4J Model Zoo to borrow functionalities from DarkNet, TinyYOLO and YOLO2. 

In step 11, we added the output layer configuration for our tiny ImageNet classifier. There are 200 labels in which the image classifier makes a prediction. In step 13, we created a Spark-based ComputationGraph using SparkComputationGraph. If the underlying network structure is MultiLayerNetwork, then you could use SparkDl4jMultiLayer instead.

In step 17, we created an evaluation instance, as shown here:

Evaluation evaluation = new Evaluation(TinyImageNetDataSetIterator.getLabels(false), 5);

The second attribute (5, in the preceding code) represents the value N, which is used to measure the top N accuracy metrics. For example, evaluation on a sample will be correct if the probability for the true class is one of the highest N values. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset