Course 4 week 2: Residual Networks - kernel size=1 stride = 2

jonaslalin · November 18, 2021, 10:04am

From the original paper

we see that the authors use a stride of 2 at the beginning of each block to downsample the information (you see the /2 in the image). They also use bottleneck components for Resnet 50 and above:

Next we describe our deeper nets for ImageNet. Because of concerns on the training time that we can afford, we modify the building block as a bottleneck design.

I recommend you read the paper and pay attention to small details.

The title of their paper is Deep Residual Learning for Image Recognition. Deep networks are computationally expensive to train and they found that downsampling makes it computationally possible to have very deep networks, which was the aim for their paper (with residual connections).

Prof Andrew Ng has a video about the tradeoff dilemma, where we can change the network architecture in different ways, given a computational budget:

Now, should you use a stride of 2 on the 1x1 convolutions or the 3x3 convolutions as they show in the picture?

Different libs have tried both ways:

I hope my comments help clear things up a little

A bonus answer: Why don’t we change the 1x1 2 stride convolutions in the shortcut to something like max pooling to preserve information? Because we want to have linear functions, and max pooling is non linear. Linear because we want our gradients to flow nicely without interruptions of non linearity. That is the whole purpose of residual connections.

2nd bonus comment: We could use average pooling instead of max pooling, because average pooling is a linear operation.

Topic		Replies	Views
DLS Course 4 Week 2 Exercise 1: 1x1 convolution with strides=2 Convolutional Neural Networks coursera-platform	3	597	February 20, 2024
I have some questions about Residual Networks Exercise Convolutional Neural Networks coursera-platform	1	555	June 28, 2021
Week 2 assignment 1, 1x1 convolutions question Convolutional Neural Networks coursera-platform	5	711	June 23, 2022
[Data loss] Convolutional Block (1x1) with stride > 1 in ResNet50 Convolutional Neural Networks coursera-platform	1	547	May 14, 2022
Data loss in ResNetv50 Convolutional Neural Networks coursera-platform	2	515	October 26, 2021

Course 4 week 2: Residual Networks - kernel size=1 stride = 2

Related topics