Why would you perform a 1x1 convolution that has a volume with a number of channels as input, and outputs a volume with the same number of channels?

Andrew explained at 5:30 in this video that by keeping the number of channels, you’ll add another layer that offers nonlinearity, which allows you to learn a more complex function of your network.

But you can still have nonlinearity even if you instead have, say a 32x32x64 volume as input and use a “same” convolution with 64 3x3 filters, and output a 32x32x64 volume. Right?

Nonlinearity is a matter of using a non-linear activation function. Not the shapes.

1 Like