Pretrained CNN model as a feature extractor

Let's leverage Keras, load up the VGG-16 model, and freeze the convolution blocks so that we can use it as just an image feature extractor:

from keras.applications import vgg16 
from keras.models import Model 
import keras 
 
vgg = vgg16.VGG16(include_top=False, weights='imagenet',  
                                     input_shape=input_shape) 
 
output = vgg.layers[-1].output 
output = keras.layers.Flatten()(output) 
vgg_model = Model(vgg.input, output) 
vgg_model.trainable = False 
 
for layer in vgg_model.layers: 
    layer.trainable = False 
 
vgg_model.summary() 
 


_________________________________________________________________ 
Layer (type)                 Output Shape              Param #    
================================================================= 
input_1 (InputLayer)         (None, 150, 150, 3)       0          
_________________________________________________________________ 
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792       
_________________________________________________________________ 
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928      
_________________________________________________________________ 
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0          
_________________________________________________________________ 
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856      
_________________________________________________________________ 
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584     
_________________________________________________________________ 
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0          
_________________________________________________________________ 
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168     
_________________________________________________________________ 
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080     
_________________________________________________________________ 
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080     
_________________________________________________________________ 
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0          
_________________________________________________________________ 
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160    
_________________________________________________________________ 
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808    
_________________________________________________________________ 
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808    
_________________________________________________________________ 
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0          
_________________________________________________________________ 
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808    
_________________________________________________________________ 
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808    
_________________________________________________________________ 
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808    
_________________________________________________________________ 
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0          
_________________________________________________________________ 
flatten_1 (Flatten)          (None, 8192)              0          
================================================================= 
Total params: 14,714,688 
Trainable params: 0 
Non-trainable params: 14,714,688 
__________________________________________________________________

The model summary shows us each block and the layers present in each block, which match with the architecture diagrams we depicted earlier. You can see that we have removed the final part of the classifier pertaining to the VGG-16 model since we will be building our own classifier and leveraging VGG as a feature extractor.

To verify that the layers of the VGG-16 model are frozen, we can use the following code:

import pandas as pd 
pd.set_option('max_colwidth', -1) 
 
layers = [(layer, layer.name, layer.trainable) for layer in 
           vgg_model.layers] 
pd.DataFrame(layers, columns=['Layer Type', 'Layer Name', 'Layer  
                               Trainable'])

The preceding code generates the following output:

print("Trainable layers:", vgg_model.trainable_weights) 
Trainable layers: []

It is quite clear from the preceding output that all the layers of the VGG-16 model are frozen, which is good because we don't want their weights to change during model training. The last activation feature map in the VGG-16 model (output from block5_pool) gives us the bottleneck features, which can then be flattened and fed to a fully connected deep neural network classifier. The following snippet shows what the bottleneck features look like for a sample image from our training data:

bottleneck_feature_example = vgg.predict(train_imgs_scaled[0:1]) print(bottleneck_feature_example.shape) 
plt.imshow(bottleneck_feature_example[0][:,:,0]) 

(1, 4, 4, 512)

The preceding code generates the following output:

We flatten the bottleneck features in the vgg_model object to make them ready to be fed to our fully connected classifier. A way to save time in model training is to use this model and extract out all the features from our training and validation datasets and then feed them as inputs to our classifier. Let's extract out the bottleneck features from our training and validation sets now:

def get_bottleneck_features(model, input_imgs): 
    features = model.predict(input_imgs, verbose=0) 
    return features 

train_features_vgg = get_bottleneck_features(vgg_model, 
                                             train_imgs_scaled) 
validation_features_vgg = get_bottleneck_features(vgg_model,      
                                                validation_imgs_scaled) 

print('Train Bottleneck Features:', train_features_vgg.shape, 
      '	Validation Bottleneck Features:',  
       validation_features_vgg.shape) 

Train Bottleneck Features: (3000, 8192) Validation Bottleneck Features: 
     (1000, 8192)

The preceding output tells us that we have successfully extracted the flattened bottleneck features of dimension 1 x 8,192 for our 3,000 training images and our 1,000 validation images. Let's build the architecture of our deep neural network classifier now, which will take these features as input:

from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, InputLayer 
from keras.models import Sequential 
from keras import optimizers 

input_shape = vgg_model.output_shape[1] 
model = Sequential() 
model.add(InputLayer(input_shape=(input_shape,))) 
model.add(Dense(512, activation='relu', input_dim=input_shape)) model.add(Dropout(0.3)) model.add(Dense(512, activation='relu')) model.add(Dropout(0.3)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', 
              optimizer=optimizers.RMSprop(lr=1e-4), 
              metrics=['accuracy']) 

model.summary() 
_______________________________________________________________
Layer (type) Output Shape Param # ================================================================= 
input_2 (InputLayer) (None, 8192) 0 _________________________________________________________________ 
dense_1 (Dense) (None, 512) 4194816 _________________________________________________________________ 
dropout_1 (Dropout) (None, 512) 0 _________________________________________________________________ 
dense_2 (Dense) (None, 512) 262656 _________________________________________________________________ 
dropout_2 (Dropout) (None, 512) 0 _________________________________________________________________ 
dense_3 (Dense) (None, 1) 513 =================================================================

Just like we mentioned previously, bottleneck feature vectors of size 8192 serve as input to our classification model. We use the same architecture as our previous models here with regard to the dense layers. Let's train this model now:

history = model.fit(x=train_features_vgg, y=train_labels_enc, 
                    validation_data=(validation_features_vgg, 
                                     validation_labels_enc),  
                    batch_size=batch_size, epochs=epochs, verbose=1) 

Train on 3000 samples, validate on 1000 samples 
Epoch 1/30 
3000/3000 - 1s 373us/step - loss: 0.4325 - acc: 0.7897 - val_loss: 0.2958 - val_acc: 0.8730 
Epoch 2/30 
3000/3000 - 1s 286us/step - loss: 0.2857 - acc: 0.8783 - val_loss: 0.3294 - val_acc: 0.8530 
... 
... 
Epoch 29/30 
3000/3000 - 1s 287us/step - loss: 0.0121 - acc: 0.9943 - val_loss: 0.7760 - val_acc: 0.8930 
Epoch 30/30 
3000/3000 - 1s 287us/step - loss: 0.0102 - acc: 0.9987 - val_loss: 0.8344 - val_acc: 0.8720

We get a model with a validation accuracy of close to 88%, almost a 5-6% improvement from our basic CNN model with image augmentation, which is excellent. The model does seem to be overfitting though, and we can check this using the accuracy and loss plots depicted in the following diagram:

There is a decent gap between the model train and validation accuracy after the fifth epoch, which kind of makes it clear that the model is overfitting on the training data after that. But overall, this seems to be the best model so far, where by leveraging the VGG-16 model as a feature extractor, we seem to get close to 90% validation accuracy without even using an image augmentation strategy. But we haven't tapped into the complete potential of transfer learning yet. Let's try using our image augmentation strategy on this model. Before that, we save this model to disk using the following code:

model.save('cats_dogs_tlearn_basic_cnn.h5')

Table of Contents for Pretrained CNN model as a feature extractor

Create new playlist

Sign In

Sign Up

Table of Contents for
Pretrained CNN model as a feature extractor