"Convolutional Implementation of Sliding Windows" Video


In week3 “Convolutional Implementation of Sliding Windows” video, I understood the concept well but there are two things that are not clear.

1 ) What are these 4 numbers are correspond to in the third dimension of 1x1x4 output?.

2 ) How changing FC layers to CONV layers help performance?

Thanks for any answer in advance…

  1. Here’s how to read the slide with respect to dimensions (height, width, num channels):
    a. When 16 filters, each of dimension 5x5 convolve the 14x14x3 input, the output is 10x10x16.
    b. Max pooling in patches of 2x2 done on output of the previous step yields 5x5x16.
    c. Following the same line of explanation, when 1x1x400 input is processed by four 1x1 filters, the output is 1x1x4 i.e. each filter produces a 1x1 output and there are 4 of them.
  2. Please see this link to understand how convolutions help with parameter sharing and more importantly, sparsity of connections.

I am okay with the second answer thanks again.

On the other hand, for the first one; that was not my question actually. I understood how the shape became like that in the first place. I was curious about contents of that 1x1x4 tensor like are they probabilities or the first one binary and the others are some other probabilities etc.

Apply softmax on the last dimension of 1x1x4 to interpret the output as probabilities. Each of the 4 units would corresponds to the probability of a certain class of object (eg: bus, human etc.) detected within the input image of dimension 14x14x3.

1 Like