I’m puzzled how to interpret 1x1 convolutions with stride > 1. So far in the course stride was <= the filter dimension. I’m puzzled by how this approach just carelessly discards entire rows and columns of pixels. Is it common to use, and how does it compare to say max pooling with overlapping stride (s <= f)?
Thanks in advance for any insights.
We usually use a 1x1 convolutional layer to reduce n_C but not n_H or n_W.
We usually use pooling layers to reduce n_H or n_W.
If we use 1x1 convolutions with stride>1:
we loose the advantage of pooling layers which max or average the values of nearby pixels, we only take one pixel and might miss that for example all neighboring pixels were important; hence, some previous feature detector has now been lost. It would be better to reduce the information, by also taking into account all the information we had before we reduce dimensionality in n_H or n_W.
That was my intuition as well, but then when I saw variable stride in 1x1 convolutions without any explanation, I was a little puzzled. Thank you for your reply, Jonas.
Ah, I didn’t see where you found the 1x1 stride 2 convolutions in the course material, but another student had a similar questions where my answer might be of interest to you as well:
Thank you, that’s a great read and gets to the bottom of the question.
great , well-done all