It's also possible to simplify the 3x3 convolution with a mechanism called a bottleneck. Similar to earlier, this will have the same representation of a normal 3x3 convolution, but with less parameters and more non-linearities.
The bottleneck works by replacing a 3x3 convolution layer with C filters with the following:
- A 1x1 convolution with C/2 filters
- A 3x3 convolution with C/2 filters
- A 1x1 convolution with C filters
An example in action is given here:
From this example, we will calculate the number of parameters to show the reduction this bottleneck has. We get the following:
This is less than the parameters we would get if we just used a 3x3 convolution layer:
Some network architectures, such as residual networks (which we will see later), use the bottleneck technique to again reduce the number of parameters and add more non-linearities.