VGG-16

The following architecture of VGG-16 was developed in the year 2015 by K. Simonyan and A. Zisserman from the University of Oxford. It not only has more parameters, but it's also more uniform and simpler to reason:

Instead of having different sizes of filters, as AlexNet does, it has the same type of filters, so it has a convolution same, 3 x 3, and it has a max pooling of 2 x 2, with a stride of two. When we talk about convolution, we'll always have a 3 x 3, convolution same, and when we say pool, we'll always have a max pooling of 2 x 2 with a stride of 2, which usually divides the first 2 dimensions by 2, while the same convolutional leaves the first 2-dimensions untouched but changes the number of channels.

Here is how the VGG-16 works:

  1. It has an input of dimension 224 x 224 x 3, a convolution layer of 64 is applied twice, which is a convolution same having dimensions of 3 x 3 x 64. The output will be such that the first two dimensions are the same, but the number of channels will be 64.
  2. We apply a max pooling that divides this by 2, which gives us 112 x 112 x 64. 
  3. We apply a convolution of 128, twice, which leaves the first 2 dimensions untouched and changes the third dimension to 112 x 112 x 128.
  4. We then apply a pool having a stride of two, giving us the output dimension 56 x 56 x 128.
  5. This pool and convolution combination is applied until the first 2 dimensions turn 14, post which, we apply 3 convolutions of 512, and leave everything untouched.
  6. The final step entails using a pool to decrease the first 2 dimensions and leave the number of channels untouched.

Observe how similar this strategy is to AlexNet. We started with 224 x 224, and we end up with 7 x 7, while the number of channels increased dramatically from 3 to 512.

For a VGG-16, we connect to 2 heightened, fully-connected layers, each having 4,096 neurons. With AlexNet, we have three of them, with one containing 9,000.

In a manner similar to AlexNet, we try to predict 1,000 output classes, and this architecture has 138,000,000 parameters, approximately 3 times more than AlexNet, which had 60,000,000 parameters. VGG-16 not only has better accuracy, but is also quite simple to extend.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset