How it works...

In step 1, we used FileSplit to filter the images based on the file type (PNG, JPEG, TIFF, and so on).

We also passed in a random number generator based on a single seed. This seed value is an integer (42 in our example). FileSplit will be able to generate a list of file paths in random order (random order of files) by making use of a random seed. This will introduce more randomness to the probabilistic decision and thereby increase the model's performance (accuracy metrics). 

If you have a ready-made dataset with an unknown number of labels, it is crucial to calculate numLabels. Hence, we used FileSplit to calculate them programmatically:

int numLabels = fileSplit.getRootDir().listFiles(File::isDirectory).length; 

In step 2, we used ParentPathLabelGenerator to generate the label for files based on the directory path. Also, BalancedPathFilter is used to randomize the order of paths in an array. Randomization will help overcome overfitting issues. BalancedPathFilter also ensures the same number of paths for each label and helps to obtain optimal batches for training. 

With testSetRatio as 20, 20 percent of the dataset will be used as the test set for the model evaluation. After step 2, the array elements in inputSplits will represent the train/test datasets:

  • inputSplits[0] will represent the train dataset.
  • inputSplits[1] will represent the test dataset.
  • NativeImageLoader.ALLOWED_FORMATS uses JavaCV to load images. Allowed image formats are .bmp, .gif, .jpg, .jpeg, .jp2, .pbm, .pgm, .ppm, .pnm, .png, .tif, .tiff, .exr, and .webp.
  • BalancedPathFilter randomizes the order of file paths in an array and removes them randomly to have the same number of paths for each label. It will also form the paths on the output based on their labels, so as to obtain easily optimal batches for training. So, it is more than just random sampling. 
  • fileSplit.sample() samples the file paths based on the path filter mentioned.

It will further split the results into an array of InputSplit objects. Each object will refer to the train/test set, and its size is proportional to the weights mentioned. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset