I need a small clarification. Please help.
As we understand from the video “Convolutions over Volume”, if the input image is color then there are 3 channels (R, G, B). And the filter also has 3 channels (R G B)
However in the video “Simple Convolutional Network Example”, the input image is assumed to be 39x39x3 (here 3 means 3 channels R, G, B). However, the filter is only 3x3.
I could not understand whether 3 channels (R,G, B) in the filter is understood or omitted.
Following is the excerpt of voice in the video which does not describe about filter RGB channels.
" Let’s say this image is 39 x 39 x 3. This choice just makes some of the numbers work out a bit better. And so, nH in layer 0 will be equal to nW height and width are equal to 39 and the number of channels and layer 0 is equal to 3.
Let’s say the first layer uses a set of 3 by 3 filters to detect features, so f = 3 or really f1 = 3,
because we’re using a 3 by 3 process. And let’s say we’re using a stride of 1, and no padding.
So using a same convolution, and let’s say you have 10 filters. Then the activations in this next layer of the neutral network will be 37 x 37 x 10, and this 10 comes from the fact that you use 10 filters."