I have a question for the CNN course W1 lecture “One Layer of a Convolutional Network”. Link to the classroom item is: https://www.coursera.org/learn/convolutional-neural-networks/lecture/nsiuW/one-layer-of-a-convolutional-network.
My question is why the dimension of weight(W) is the same as the filter? I thought the dimension of W would be the same as that of the output (n+2p-f+1)/s * (n+2p-f+1)/s * n.
Can anyone give me an explanation or a hint?
Keep in mind that the weight is the same size as the filter because it depicts one point in that filter (here 3 x 3 in the first example).
However, what changes is the size of the output layer because we are now taking into account the padding and stride (i.e. whether we are taking into account edge cases with padding or the number of steps we move the filter to sample each time).
Depending on the size of our input layer and the variations put on padding and stride, you could end up with various output sizes with a 3 x 3 filter, but the actual number of points you are calculating/sampling from stays consistent.
This remains the same (in terms of weights) if your filter is 3 x 3, 5 x 5, etc. But padding/stride have no effect on that filter for running the filter one time.
*Also keep in mind, in this video we are only running one layer of a convolution.
Exactly. The weights are the filters. It’s just that there’s one set for every output channel and each one matches the number of channels in the input. So the dimensions of W are:
f x f x nC_{in} x nC_{out}
Once you know the dimensions of the input, then you can compute the dimensions of the output if you also know the stride and padding values. Here’s the formula: