AlexNet

In 2012, AlexNet significantly outperformed all the prior competitors and won the ILSVRC by reducing the top-5 error to 15.3%, compared to the runner-up with 26%. This work popularized the application of CNNs in computer vision. AlexNet has a very similar architecture to that of LeNet, but has more filters per layer and is deeper. Also, AlexNet introduces the use of stacked convolution, instead of always using alternative convolution pooling. A stack of small convolutions is better than one large receptive field of convolution layers, as this introduces more non-linearities and fewer parameters.

Suppose we have three 3 x 3 convolution layers on top of each other (with non-linearity or pooling layer in between). Here, each neuron on the first convolution layer has a 3 x 3 view of the input volume. A neuron on the second convolution layer has a 3 x 3 view of the first convolution layer, and hence a 5 x 5 view of the input volume. Similarly, a neuron on the third convolution layer has a 3 x 3 view of the second convolution layer, and hence a 7 x 7 view of the input volume. Clearly, the number of parameters for a 7 x 7 receptive field is a factor of 49, compared to three stacked 3 x 3 convolutions with a factor of 3 x (3 x 3) = 27 parameters.

Table of Contents for AlexNet

Create new playlist

Sign In

Sign Up

Table of Contents for
AlexNet