Why Align Channels in Convolution?

I have a question about 3D convolution. It was recommended here to set the filter so that the number of channels in the input matches the number of channels in the filter. I would like to know a little more about why.

When convoluting each pixel vertically and horizontally, we were able to detect vertical and parallel boundaries in the filtered small portion of the image. Similarly, in 3D, wouldn’t it be possible to find some meaning by detecting only one of RGB? It turned out to be an abstract question, but I hope my intention is conveyed.

The definition of how the convolutional filters are applied is that is in 3 dimensions: height, width and depth. The examples Prof Ng gives in Week 1 concentrate on the h and w dimensions, but if you watch carefully he does talk about how the channels work.

Note that at each convolution layer it is a “hyperparameter” how many output channels you have. But each filter needs to match the h, w and c dimensions of the input.