Week 2 ResNet programming exercise: the use of one-by-one convolution

realnoob · August 29, 2022, 5:36am

in the programming exercise of week 2 - ResNet, part 3.2 - The Convolutional Block, it states, quote

“For example, to reduce the activation dimensions’s height and width by a factor of 2, you can use a 1x1 convolution with a stride of 2.”

As far as I know, 1x1 convolutions only reduce number of channels in the activation layers, right? It is the pooling layers that reduce the height and width of the input. Is it an error in the programming exercise?

alvaroramajo · August 29, 2022, 8:46am

Hi, @realnoob !

Yes, 1x1 convolutions reduce the number of channels, but any convolution that has a stride different than 1 also reduces the output size. In this case, a stride 2 will output a feature map that is half the height and width (stride 3, a third)

realnoob · August 29, 2022, 10:38am

Ahh now I get it. Thank you so much, I forgot about that

paulinpaloalto · August 29, 2022, 8:28pm

This is an interesting point. Notice that a 1 x 1 convolution with stride = 2 means that you are literally discarding (completely ignoring) half of the inputs. They could also have done a pooling layer with stride of 2 and gotten the same dimensionality reduction without actually literally ignoring any of the inputs. This question has been asked multiple times before, but I don’t know why they made that design choice. The only argument I can think of is that the 1 x 1 convolution would be a bit less compute cost to achieve that size reduction. But the idea of simply discarding inputs seems a bit counterintuitive. Well, maybe you could consider a Max Pooling layer as ignoring half the inputs: it does drop half of them, but it looks at the values to decide which ones to keep. That’s not really equivalent to ignoring them completely. Well, there is one other difference when using pooling layers: they operate “channelwise”, so the number of channels is preserved. With 1 x 1 convolutions, you can also reduce the number of channels at the same time.

If anyone has the energy, it would be worth reading the original Residual Net papers to see if they comment on this design choice.

Topic		Replies	Views
Week 2 assignment 1, 1x1 convolutions question Convolutional Neural Networks	5	707	June 23, 2022
Why use 1x1 Conv2d of stride 2 in resnet block? Convolutional Neural Networks	1	588	March 13, 2022
[Data loss] Convolutional Block (1x1) with stride > 1 in ResNet50 Convolutional Neural Networks	1	546	May 14, 2022
DLS Course 4 Week 2 Exercise 1: 1x1 convolution with strides=2 Convolutional Neural Networks	3	595	February 20, 2024
Data loss in ResNetv50 Convolutional Neural Networks	2	515	October 26, 2021

Week 2 ResNet programming exercise: the use of one-by-one convolution

Related topics