Stage 2 of ResNet-50 contains:
- The convolutional block uses three sets of filters of size [64,64,256], “f” is 3, and “s” is 1.
- The 2 identity blocks use three sets of filters of size [64,64,256], and “f” is 3.
My question is why are we using convolutional block with stride = 1 instead of Identity block? Isn’t convolutional block with stride = 1 the same as using Identity block but with the CONV2D layer followed by BatchNorm in the shortcut path? With s = 1 the shortcut and mainpath will be of the same size right, then what is the use of resizing to the same dimensions as before with the CONV2D layer?
Hey @usman.n, Thanks for your post.
Okay let’s break your question into points to cover.
First yes you’re correct the convolutional block has a stride of 1, it’s similar to an identity block where the spatial dimensions are preserved “Similar but not exactly the same”. But you need to consider the difference here which lies in the transformations applied within the convolutional block. In a convolutional block, the use of convolutions, batch normalization, and activation functions allows the network to learn more complex and non-linear transformations of the input data, potentially improving its representational power.
So Convolutional blocks with a stride of 1 introduce more parameters and complexity compared to identity blocks. This increased complexity can allow the network to learn more intricate features and patterns. While identity blocks are helpful for maintaining information flow and alleviating the vanishing gradient problem, they might not provide as much capacity for feature extraction.
So now you got it why we use the convolutional block instead Identity one.
Now coming to next part of your question which is about “The use of resizing”.
The resizing operation you mentioned refers to the application of a 1x1 convolutional layer (CONV2D layer with a 1x1 filter) to the shortcut path (skip connection). This operation is used to match the dimensions of the output of the convolutional block with the dimensions of the input to the block.
Even when the stride is set to 1, the number of filters in the convolutional block might differ from the number of filters in the input. The 1x1 convolutional layer in the shortcut path is used to adjust the number of filters so that they match and can be element-wise added to the output of the convolutional block. This step ensures compatibility between the skip connection and the main path, allowing the addition of feature maps from both paths.
Hope it’s clear now and feel free to ask for more clarifications