The intuition behind TL

Let's build up the intuition behind TL by using the following teacher-student analogy. A teacher has many years of experience in the modules that he'she's teaching. On the other side, the students get a compact overview of the topic from the lectures that this teacher gives. So you can say that the teacher is transferring their knowledge in a concise and compact way to the students.

The same analogy of the teacher and students can be applied to our case of transferring knowledge in deep learning, or in neural networks in general. So our model learns some representations from the data, which is represented by the weights of the network. These learned representations/features (weights) can be transferred to another different but similar task. This process of transferring the learned weights to another task will reduce the need for huge datasets for deep learning architectures to converge, and it will also reduce the time needed to adapt the model to the new dataset compared to training the model from scratch.

Deep learning is widely used nowadays, but usually most people are using TL while training deep learning architectures; few of them train deep learning architectures from scratch, because most of the time it's rare to have a dataset of sufficient size for deep learning to converge. So it's very common to use a pre-trained model on a large dataset such as ImageNet, which has about 1.2 million images, and apply it to your new task. We can use the weights of that pre-trained model as a feature extractor, or we can just initialize our architecture with it and then fine-tune them to your new task. There are three major scenarios for using TL:

Use a convolution network as a fixed feature extractor: In this scenario, you use a pre-trained convolution model on a large dataset such as ImageNet and adapt it to work on your problem. For instance, a pre-trained convolution model on ImageNet will have a fully connected layer with output scores for the 1,000 categories that ImageNet has. So you need to remove this layer because you are not interested anymore in the classes of ImageNet. Then, you treat all other layers as a feature extractor. Once you have extracted the features using the pre-trained model, you can feed these features to any linear classifier, such as the softmax classifier, or even linear SVM.
Fine-tune the convolution neural network: The second scenario involves the first one but with an extra effort to fine-tune the pre-trained weights on your new task using backpropagation. Usually, people keep most of the layers fixed and only fine-tune the top end of the network. Trying to fine-tune the whole network or even most of the layers may result in overfitting. So, you might be interested in fine-tuning only those layers that are concerned with the semantic-level features of the images. The intuition behind leaving the earlier layers fixed is that they contain generic or low-level features that are common across most imaging tasks, such as corners, edges, and so on. Fine-tuning the higher level or the top end layers of the network will be useful if you're introducing new classes that are not present in the original dataset that the model was pre-trained on.

Figure 10.1: Fine-tuning the pre-trained CNN for a new task

Pre-trained models: The third widely used scenario is to download checkpoints that people have made available on the internet. You may go for this scenario if you don't have big computational power to train the model from scratch, so you just initialize the model with the released checkpoints and then do a little fine-tuning.

Table of Contents for The intuition behind TL

Create new playlist

Sign In

Sign Up

Table of Contents for
The intuition behind TL