How it works...

In step 1, the value of saveUpdater is going to be true if we plan to train the model at a later point. We have also discussed pre-trained models provided by DL4J's model zoo API. Once we add the dependency for deeplearning4j-zoo, as mentioned in step 1, we can load pre-trained models such as VGG16, as follows:

ZooModel zooModel = VGG16.builder().build();
ComputationGraph pretrainedNet = (ComputationGraph) zooModel.initPretrained(PretrainedType.IMAGENET);

DL4J has support for many more pre-trained models under its transfer learning API.

Fine-tuning a configuration is the process of taking a model that was trained to perform a task and training it to perform another similar task. Fine-tuning configurations is specific to transfer learning. In steps 3 and 4, we added a fine-tuning configuration specific to the type of neural network. The following are possible changes that can be made using the DL4J transfer learning API:

  • Update the weight initialization scheme, gradient update strategy, and the optimization algorithm (fine-tuning)
  • Modify specific layers without altering other layers
  • Attach new layers to the model

All these modifications can be applied using the transfer learning API. The DL4J transfer learning API comes with a builder class to support these modifications. We will add a fine-tuning configuration by calling the fineTuneConfiguration() builder method.

As we saw earlier, in step 4 we use GraphBuilder for transfer learning with computation graphs. Refer to our GitHub repository for concrete examples. Note that the transfer learning API returns an instance of the model from the imported model after applying all the modifications that were specified. The regular Builder class will build an instance of MultiLayerNetwork while GraphBuilder will build an instance of ComputationGraph.

We may also be interested in making changes only in certain layers rather than making global changes across layers. The main motive is to apply further optimization to certain layers that are identified for further optimization. That also begs another question: How do we know the model details of a stored model? In order to specify layers that are to be kept unchanged, the transfer learning API requires layer attributes such as the layer name/layer number.

We can get these using the getLayerWiseConfigurations() method, as shown here:

oldModel.getLayerWiseConfigurations().toJson()

Once we execute the preceding, you should see the network configuration mentioned as follows: 

Gist URL for complete network configuration JSON is at https://gist.github.com/rahul-raj/ee71f64706fa47b6518020071711070b.

Neural network configurations such as the learning rate, the weights used in neurons, optimization algorithms used, layer-specific configurations, and so on can be verified from the displayed JSON content.

The following are some possible configurations from the DL4J transfer learning API to support model modifications. We need layer details (name/ID) in order to invoke these methods: 

  • setFeatureExtractor(): To freeze the changes on specific layers
  • addLayer(): To add one or more layers to the model
  • nInReplace()/nOutReplace(): Modifies the architecture of the specified layer by changing thnIn or nOut of the specified layer
  •  removeLayersFromOutput(): Removes the last n layers from the model (from the point where an output layer must be added back)
Note that the last layer in the imported transfer learning model is a dense layer. because the DL4J transfer learning API doesn't enforce training configuration on imported model. So, we need to add an output layer to the model using the addLayer() method.
  • setInputPreProcessor(): Adds the specified preprocessor to the specified layer

In step 5, we saw another way to apply transfer learning in DL4J, by using TransferLearningHelper. We discussed two ways in which it can be implemented. When you create TransferLearningHelper from the transfer learning builder, you need to specify FineTuneConfiguration as well. Values configured in FineTuneConfiguration will override for all non-frozen layers.

There's a reason why TransferLearningHelper stands out from the regular way of handling transfer learning. Transfer learning models usually have frozen layers with constant values across training sessions. The purpose of frozen layers depends on the observation being made in the existing model performance. We have also mentioned the setFeatureExtractor() method, which is used to freeze specific layers. Layers can be skipped using this method. However, the model instance still holds the entire frozen and unfrozen part. So, we still use the entire model (including both the frozen and unfrozen parts) for computations during training. 

UsinTransferLearningHelper, we can reduce the overall training time by creating a model instance of just the unfrozen part. The frozen dataset (with all the frozen parameters) is saved to disk and we use the model instance that refers to the unfrozen part for the training. If all we have to train is just one epoch, then setFeatureExtractor() and the transfer learning helper API will have almost the same performance. Let's say we have 100 layers with 99 frozen layers and we are doing N epochs of training. If we use setFeatureExtractor(), then we will end up doing a forward pass for those 99 layers N times, which essentially takes additional time and memory.

In order to save training time, we create the model instance after saving the activation results of the frozen layers using the transfer learning helper API. This process is also known as featurization. The motive is to skip computations for frozen layers and train on unfrozen layers.

As a prerequisite, frozen layers need to be defined using the transfer learning builder or explicitly mentioned in the transfer learning helper.

TransferLearningHelper was created in step 3, as shown here:

TransferLearningHelper tHelper = new TransferLearningHelper(oldModel, "layer2")

In the preceding case, we explicitly specified freezing all of the layers up to layer2 in the layer structure.

In step 6, we discussed saving the dataset after featurization. After featurization, we save the data to disk. We will need to fetch this featurized data to train on top of it. Training/evaluation will be easier if we separate it and then save it to disk. The dataset can be saved to disk using the save() method, as follows:

currentFeaturized.save(new File(fileFolder,fileName));

saveTodisk() is the customary way to save a dataset for training or testing. The implementation is straightforward as it's all about creating two different directories (train/test) and deciding on the range of files that can be used for train/test. We'll leave that implementation to you. You can refer to our example in the GitHub repository (SaveFeaturizedDataExample.java): https://github.com/PacktPublishing/Java-Deep-Learning-Cookbook/blob/master/11_Applying%20Transfer%20Learning%20to%20network%20models/sourceCode/cookbookapp/src/main/java/SaveFeaturizedDataExample.java.

In steps 7/8, we discussed training our neural network on top of featurized data. Our customer retention model follows MultiLayerNetwork architecture. This training instance will alter the network configuration for the unfrozen layers. Hence, we need to evaluate the unfrozen layers. In step 5, we evaluated just the model on the featurized test data as shown here:

transferLearningHelper.unfrozenMLN().evaluate(existingTestData);

If your network has the ComputationGraph structure, then you can use the unfrozenGraph() method instead of unfrozenMLN() to achieve the same result. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset