1 x 1 convolutions maintain nh and nw values while changing nc values. We can achieve the same using a convolution layer with “same” padding and the required number of filters. What difference does the activation part make in 1 x 1 convolution compared to when we take “same” padding conv. layer? What does adding the non-linearity do?
The main purpose of 1x1 convolution is to reduce the computational requirement.
Large convolution like 3x3, 5x5 or more, requires huge computational power. Roughly, it is proportional to “height x width”. So, the computational power requirement is much smaller than larger filter with same padding. (1x1 convolution can even run on CPU.)
But, the advantage is its usage.
It is mostly used to reduce the number of channel “Before” large convolutions, which helps to reduce the computational requirements. Then, “After” large convolutions, 1x1 convolution can easily put back the channel number to the original size with small computational power.
This is called “bottleneck”, and used by several convolutional networks.
“Activation” is a kind of “additional bonus”. Of course, we can specify an activation function to 1x1 to increase a non-linearity, which also increases capability of the network.
Hope this helps.
This clears it up. Thank you so much!