in What are deep ConvNets learning? week 4: why visualize filter the later layers are clearly visible, according to my understanding, the more the following layers, the more complicated the model learns, and the more complex its feature map (unrecognizable by the naked eye) and when you apply a filter to that feature map, it learns those complex things as well.
Welcome to the community. When you say, the later layers will have more complicated feature maps as compared to the earlier layers, what exactly strikes your mind? If we compare a neuron learning to detect a horizontal edge and a neuron learning to detect a horse, then naturally, we would say that the later neuron has a better, more complicated understanding than the former.
And due to the fact that the neurons in the later part of the network see a larger region of the input images and have inputs from a lot of other neurons in the earlier portion of the network understanding the easier things, they are able to detect more complex features for instance, horses, tires, cats, etc, and this is what can be proved from the references mentioned in the lecture video.
In case, you have the understanding that more complicated means abstract for humans, then it might not be true in some cases, and might be true in others. Let’s say that you are training a deep ConvNet to detect cars in the input images. It is completely possible for you to find that some of the later neurons are detecting something that seems to be absolutely rubbish to you, because the model might see something in that which you are not able to. At the same time, you may find that some neurons detect things that makes complete sense to you, such as tires, number plates, etc, since, existence of these things provides a certain level of surety that the object in the image is a car, and hence, the neurons have learnt to detect these. I hope this helps.
Elemento has done a great job explaining the concepts in detail here. One thing to note (which Elemento did mention, but I wanted to emphasize) is that the point of what Prof Ng is showing in this lecture is not really about things that are “not visible to humans”, right? The interesting thing being shown here is justifying the overall intuitive explanation that deeper layers of the network are “integrating” the simpler detections done by earlier layers (edges or curves or colors) and putting them together into the ability to recognize a more complex object like a cat’s ear or a cat’s tail and (eventually by the final layers) a complete cat. The work he shows “instruments” neurons in some of the deep layers of the network so that we can see what type of input “activates” that neuron the most strongly, meaning that is the thing that it has most strongly learned to detect.
I think the best idea would be to rewatch the lectures now that you’ve heard what Elemento has said. I think it will make a lot more sense the second time through with those ideas in mind.