After having seen the 3D Convolution video, Andrew mentions that the number of channels should be the same for the input and the kernel being used for the convolution.
My question is, How could we achieve convolution on a 3D Image that is greyscale, since we only have a single “colour channel” for the pixel intensity.
Would we match the depth dimension with the third dimension of the kernel?
It’s upto the convolution kernel to pick the right shape based on the number of channels.
There’s no restriction on the type of image you can expose conv layer to. For instance, you can use conv layer on 1 color channel dataset as well. You’ll understand this better when doing the assignment.
Yes, if your inputs are greyscale images, then the input dimensions would be h x w x 1. It is perfectly fine to use convolutions with different numbers of input channels than 3. Of course that means that the filters in the first conv layer will be f x f x 1 to match the inputs. But there is no reason why the number of output channels will stay 1: it is a design choice (hyperparameter) that you need to make as the system designer how many output channels you have at each conv layer.
One other high level point worth mentioning is that a given network can’t handle multiple types of images (some greyscale and some RGB, for example), unless you supply a “preprocessing layer” that converts all the inputs into a common format, e.g. by converting the greyscale images into RGB images. Shades of grey are colors, so they can also be represented in RGB or CMYK or other multichannel image formats if necessary.