GoogLeNet (inception network)

The ILSVRC 2014 winner was a convolutional network called GoogLeNet from Google. It achieved a top-5 error rate of 6.67%! This was very close to human-level performance. The runner up was the network from Karen Simonyan and Andrew Zisserman known as VGGNet. GoogLeNet introduced a new architectural component using a CNN called the inception layer. The intuition behind the inception layer is to use larger convolutions, but also keep a fine resolution for smaller information on the images.

So, we can convolve in parallel with different kernel sizes, starting from 1 x 1 to a bigger one, such as 5 x 5, and the outputs are concatenated to produce the next layer:

Clearly, adding more layers explodes the parameter space. To control this, a dimensionality-reduction trick is used. Note that the 1 x 1 convolution basically does not reduce the spatial dimensions of the image. But, we can reduce the number of feature maps with 1 x 1 filters and reduce the depth of the convolution layer, as shown in the following diagram:

The following diagram describes the full GoogLeNet architecture:

Table of Contents for GoogLeNet (inception network)

Create new playlist

Sign In

Sign Up

Table of Contents for
GoogLeNet (inception network)