Confusion in understanding the filter size and the number of filters

Week 1, https://www.coursera.org/learn/convolutional-neural-networks/lecture/A9lXL/simple-convolutional-network-example timestamp 3:12

The first layer, which takes in the input image (39 * 39 * 3), has 10 filters of size of 3. Does that mean 10 filters of 3 * 3 * 3? it would make sense since we have three channels in input.

But in the following layer, we have 20 filters of size 5,

now, here is my question: for the second filter, do we have 20 filters of 5 * 5 * 10 or 20 filters of 5 * 5 applied to each of the 10 channels of the last layer?

I hope my words can describe my confusion here. Any help is appreciated.

The size of the filter at a given layer must match the number of channels on the input. So in the second case where the input has 10 channels and you want a filter size of f = 5, then each filter will have shape:

5 x 5 x 10

If you then have 20 such filters in that layer, then the W value will be a 4D tensor with shape

5 x 5 x 10 x 20

The general form is f x f x nC_{in} x nC_{out}.

1 Like