Visualizing feature maps

Visualizing a CNN model involves looking at the intermediate layer feature maps that are output by various convolution and pooling layers in a network, given a certain input. This gives a view into how an input is processed by the network, and how various image features are hierarchically extracted. All feature maps have three dimensions: width, height, and depth (channels). We will try to visualize them for the InceptionV3 model.

Let's take the following input photo of a Labrador dog, and try to visualize various feature maps. As the InceptionV3 model has huge depth, we will visualize just a few of the layers:

First, let's create a model to take the input image and output all the internal activation layers. The activation layers in InceptionV3 are named as activation_i. So, we can filter out the activation layers from the loaded Inception model as shown in the following code:

activation_layers = [ layer.output for layer in model.layers if 
                      layer.name.startswith("activation_")]

layer_names = [ layer.name for layer in model.layers if 
                layer.name.startswith("activation_")]

Now, let's create a model that takes the input image and outputs all aforementioned activation layer features as a list, as demonstrated in the following code:

from keras.models import Model
activation_model = Model(inputs=model.input, outputs=activation_layers)

Now, to get the output activations, we can use the predict function. We have to use the same preprocessing function defined precedingly to preprocess the image before feeding it to the Inception network:

img = preprocess_image(base_image_path)
activations = activation_model.predict(img)

We can plot these preceding activations. All the filter/feature maps in one activation layer can be plotted in a grid. So, based on the number of filters in a layer, we will define an image grid as a NumPy array, as shown in the following code (some parts of the following code are borrowed from https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html):

import matplotlib.pyplot as plt

images_per_row = 8
idx = 1 #activation layer index

layer_activation=activations[idx]
# This is the number of features in the feature map
n_features = layer_activation.shape[-1]
# The feature map has shape (1, size1, size2, n_features)
r = layer_activation.shape[1]
c = layer_activation.shape[2]
    
# We will tile the activation channels in this matrix
n_cols = n_features // images_per_row
display_grid = np.zeros((r * n_cols, images_per_row * c))
print(display_grid.shape)

Now, we will loop through all the feature maps in the activation layer, and put the scaled output to the grid as shown in the following code:

# We'll tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,:, :, col *   
                                             images_per_row + row]
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 
                                    255).astype('uint8')
            display_grid[col * r : (col + 1) * r,
            row * c : (row + 1) * c] = channel_image
    # Display the grid
    scale = 1. / r
    plt.figure(figsize=(scale * display_grid.shape[1],
               scale * display_grid.shape[0]))
    plt.title(layer_names[idx]+" #filters="+str(n_features))
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')

Following are the outputs from various layers:

The first two preceding activation layers act as a collection of various edge detectors. These activations retain almost all the information present in the initial picture.

Let's look at the following screenshot, which shows a layer from the middle of the network. Here, it starts recognizing higher-level features like nose, eyes, tongue, mouth, and so on:

As we go higher up, the feature maps become less visually interpretable. The activations of layers higher up carry minimal information about the specific input being seen, and more information about the target class of the image, that is, in this case, a dog.

An alternative way to visualize the filters learned by InceptionV3 is to display the visual pattern for which each filter outputs maximum activation values. This can be done with gradient ascent in the input space. Basically, finding an input image that maximizes the activity of interest (activation of neurons in a layer) by carrying out an optimization using gradient ascent in the image space. The resulting input image would be one that the chosen filter is maximally responsive to.

Every activation layer has many feature maps. The following code demonstrates how we can extract a single feature map from the last activation layer. This activation value is actually out loss that we want to maximize:

layer_name = 'activation_94'
filter_index = 0
layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])

To calculate the gradient of the input image with respect to this loss function, we can use the keras backend gradient function as follows:

grads = K.gradients(loss, model.input)[0]
# We add 1e-5 before dividing so as to avoid accidentally dividing by 
# 0.
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

So, given an activation layer and a starting input image that can be a random noise, we can apply gradient ascent using the gradient computation above to get the pattern the feature map represents. Following the generate_pattern function does the same. The output pattern is normalized so that we have feasible RGB values in the image matrix, and this is done by using the deprocess_image method. This following code is self-explanatory, and has inline comments to explain each line:

def generate_pattern(layer_name, filter_index, size=150):
    # Build a loss function that maximizes the activation
    # of the nth filter of the layer considered.
    layer_output = model.get_layer(layer_name).output
    loss = K.mean(layer_output[:, :, :, filter_index])
    # Compute the gradient of the input picture wrt this loss
    grads = K.gradients(loss, model.input)[0]
    # Normalization trick: we normalize the gradient
    grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)
    # This function returns the loss and grads given the input picture
    iterate = K.function([model.input], [loss, grads])
    # We start from a gray image with some noise
    input_img_data = np.random.random((1, size, size, 3)) * 20 + 128.
    # Run gradient ascent for 40 steps
    step = 1.
    for i in range(40):
        loss_value, grads_value = iterate([input_img_data])
        input_img_data += grads_value * step
        img = input_img_data[0]
    return deprocess_image(img)


def deprocess_image(x):
    # normalize tensor: center on 0., ensure std is 0.1
    x -= x.mean()
    x /= (x.std() + 1e-5)
    x *= 0.1
    # clip to [0, 1]
    x += 0.5
    x = np.clip(x, 0, 1)

    # convert to RGB array
    x *= 255
    x = np.clip(x, 0, 255).astype('uint8')
    return x

The following screenshot is the visualizations of some of the layers of filters. The first layers have various types of dot patterns:

Table of Contents for Visualizing feature maps

Create new playlist

Sign In

Sign Up

Table of Contents for
Visualizing feature maps