What is a hidden unit in CNN?

The terminology has been used in the “What are deep ConvNets learning?” lecture. It is mentioned that one hidden unit in the first layer can see only a small image patch. Can anyone please explain why this is the case because the filter size is the largest for the first layer and seems to be covering a lot of image area while convolving?

I think I understand why this is the case. In earlier layers, though we have a larger filter size the output of that convolution is only dependent on those 110x110 pixels. But now as we go deeper we take input from the previous layer. Let’s take the second layer now. For it, we are convolving with a 55x55 filter. Thus the output is dependent on 55x55 pixels from the previous layer. But each of these pixels, in turn, is coming from a 110x110 area of the original image thus the currently hidden unit is being influenced by more number of pixels from the original image than the hidden unit in the previous layer and hence it sees a larger area.

Correct me if this is wrong

1 Like

Hi, @Dhruv_Deshmukh !

Exactly, your understanding is correct

Let’s say the CNN is binary classification. The final output is a single number: 1 or 0. In effect, that output ‘sees’ the entire image, as all of the information has been distilled into the one value. The initial layers detect only colors or edges and then basic shapes. But stack enough of them and they become a ‘cat’ detector.