In start they say 6x6x3 image 3 is color channel but now they are telling 37x37x40 40 is filter size. anybody can elaborate this

in start they say 6x6x3 image 3 is color channel but now they are telling 37x37x40 40 is filter size. anybody can elaborate this.

Can you provide more information:

  • Which week of the course?
  • What specific topic are you asking about?

this is week one lecture

Prof Ng does discuss all this in the lectures. The inputs to a ConvNet are typically images, meaning that they are h x w x c arrays, where h and w are the pixel dimensions and c is the number of color channels, which is typically either 1 (for grayscale images) or 3 for color images (RGB). So the shape of the filters for the first Conv layer are f x f x 3 if the inputs are RGB images. But then the number of output channels is determined by the number of filters you have and that number is a “hyperparameter”, meaning a choice you have to make as the system designer. If you choose 40 as the number of filters, then you get 40 output channels. The typical pattern as you go through the multiple layers of a ConvNet is that the h and w dimensions reduce, but the channel dimension increases, which is what Prof Ng was showing in that example. You can think of it as distilling the geometric information into detection of features. The level of integration or sophistication of the “features” grows as you move through the layers of the network. E.g. the early layers detect things like edges or curves or other basic geometric features. Then the later layers put those together to recognize things like a cat’s ear or eye or tail and eventually a complete cat or dog or kangaroo or whatever the labels are on your data.