The Inception module (or block of layers) aims to cover a large area but also keep a fine resolution in order to see the important local information in images as well. In addition to creating deeper networks, the inception block introduces the idea of parallel convolutions. What we meant by this is that in-parallel convolutions of different sizes are performed on the output of the previous layer.
A naive view of the inception layer can be seen here:
Basically, the idea of the inception block is to use all available kernel sizes and operations to cover the most information possible and let the backpropagation decide what to use based on your data. The only problem seen in the preceding diagram is the computation cost, so the graph will be a bit different in practice.
Consider the 5x5 branch we saw previously, and let's check its computing cost:
Now, consider the following change; we add a 1x1 CONV to transform the 5x5 CONV input depth from 192 to 16:
If you observe, the compute is now 10 times more efficient. The 1x1 layer squeezes the massive depth (bottleneck) before sending to the 5x5 CONV layer.
Considering this bottleneck change, the true inception layer is a little more complex:
Also, in some implementations, you may notice people trying to use Batchnorm or dropout inside the inception block.
Googlenet will be just many inception blocks in cascade; in this piece of code, we show how to create an inception block:
# Reference: https://github.com/khanrc/mnist/blob/master/inception.py import tensorflow as tf def inception_block_a(x, name='inception_a'): # num of channels: 384 = 96*4 with tf.variable_scope(name): # Pooling part b1 = tf.layers.average_pooling2d(x, [3,3], 1, padding='SAME') b1 = tf.layers.conv2d(inputs=b1, filters=96, kernel_size=[1, 1], padding="same", activation=tf.nn.relu) # 1x1 part b2 = tf.layers.conv2d(inputs=x, filters=96, kernel_size=[1, 1], padding="same", activation=tf.nn.relu) # 3x3 part b3 = tf.layers.conv2d(inputs=x, filters=64, kernel_size=[1, 1], padding="same", activation=tf.nn.relu) b3 = tf.layers.conv2d(inputs=b3, filters=96, kernel_size=[3, 3], padding="same", activation=tf.nn.relu) # 5x5 part b4 = tf.layers.conv2d(inputs=x, filters=64, kernel_size=[1, 1], padding="same", activation=tf.nn.relu) # 2 3x3 in cascade with same depth is the same as 5x5 but with less parameters # b4 = tf.layers.conv2d(inputs=b4, filters=96, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) b4 = tf.layers.conv2d(inputs=b4, filters=96, kernel_size=[3, 3], padding="same", activation=tf.nn.relu) b4 = tf.layers.conv2d(inputs=b4, filters=96, kernel_size=[3, 3], padding="same", activation=tf.nn.relu) concat = tf.concat([b1, b2, b3, b4], axis=-1) return concat