Why use 1x1 Conv2d of stride 2 in resnet block?

Hi, I am looking for reasons why 1x1 with stride 2 is the first component of the main path in the ResNet assignment.

This is literally skipping half of the output from the previous layer. I would perform the stride 2 operation in the second component of the main block because at least there we use fxf filter. (Or)
It would be better if the previous block had a max pool with stride 2 because at least then we would be dropping values that are less important.

Why even bother computing these numbers if they are going to be dropped?

It’s an excellent point that has been brought up before, but none of the previous discussions have really found any explanation or justification for doing this. If the goal is to reduce the size of the output at a given layer, a pooling layer would also achieve that with less loss of information. Although you’d then need to follow that with a 1 x 1 Conv layer with stride of 1 to really get the same effect. Of course, that would be more computationally expensive. But exactly as you say, it seems strange to literally ignore half the inputs at various layers.

I have not taken the trouble to go read any of the papers on Residual Nets. The hope would be that they might comment on this aspect, but there is no guarantee. If anyone has the time and energy to pursue that, please let us know what you find!