Week 3 Convolutional Implementation of Sliding Windows

image

After the max-pooling the output is 5x5x16, is it understood the stride is 2, or did I miss something?
Should the output not be ( n-f+2p/s + 1) ?

If f = 5, p = 0 and s = 1, then we have:

\lfloor \displaystyle \frac {14 - 5 + 2*0} {1} \rfloor + 1 = 9 + 1 = 10

right?

Hi Paul, I’m sorry for editing the question, I realized I unconsciously added a stride of 2 while calculating the first output. Could you help understand the output to the max pooling layer too?

Yes, there is no rule that the stride has to be the same at every layer. So it’s 1 for the first conv layer, but at the max pooling layer it’s f = 2 and s = 2, which is one of the standard pooling choices.

\lfloor \displaystyle \frac {10 - 2 + 2*0}{2} \rfloor + 1 = 4 + 1 = 5

The one key difference to note with pooling layers is that the operation is always done “channelwise” meaning “per channel”, so the number of output channels stays the same. Of course that is not the way conv layers work.

2 Likes

Thanks a lot for the quick and precise answers, Paul!