Doubt in 'Visualizing deep layers'

I have 2 questions regarding this topic;

  1. In the lecture, we are shown some pictures from certain hidden units of the network,
    what does the professor mean when he says ‘hidden unit’ in the CNN?

  2. In the image above, where is the output of all those 9 3x3s coming from, is it the output of a window which arises when we place a filter on top of the image and then use ReLU on it?


The hidden units in an NN are all of the layers that are not the input or output layer.

Don’t pay too much attention to the sub-images in that figure. He’s just giving an intuitive concept of how you could imaging an NN works.

In practice, we don’t know in advance exactly what the hidden layers are going to represent, as it depends on what weight values are learned to minimize the cost on the training set.

Okay, so let me guess how each sub-image in the image above comes from. We pass the sub-image in the network and note the activations as it goes through the layers, the layer which gives the largest activation can be thought of as the layer which is good at detecting the features present in the sub-image.

Therefore, layer 2 above is good at detecting the features present in all those sub-images.

Please correct me if I’m wrong.

We don’t actually look at the outputs from the hidden layers at all.
What matters is minimizing the cost. We don’t really care what the hidden layers are doing.

If we are not looking at the outputs of the hidden layers (which I know that we don’t in practice, but I’m unsure about this specific example), how can we know that a layer ‘x’ is good at detecting features ‘y’?
Can you please link me to an implementation/understanding of the paper, if you have any?

We don’t know, and we really don’t care. We don’t have any influence over what any unit in any layer is going to detect.