I have a question about 3D convolution. It was recommended here to set the filter so that the number of channels in the input matches the number of channels in the filter. I would like to know a little more about why.
When convoluting each pixel vertically and horizontally, we were able to detect vertical and parallel boundaries in the filtered small portion of the image. Similarly, in 3D, wouldn’t it be possible to find some meaning by detecting only one of RGB? It turned out to be an abstract question, but I hope my intention is conveyed.