Understanding ZFNet

From the ZFNet video, when we are talking about the AlexNet layer-1 problem,

  • why we are saying that AlexNet was capturing only the darker and lighter sheds of the frequency? And no mid-frequency? How do you realize that from the visualization? At least to me it is not clear from the visualization.

Attached image

Frequency denotes the rate of change of intensity per pixel (for images).
If the intensity changes very rapidly in a particular image portion (within a few pixels) then it’s a region of high frequency. On the other hand, if intensity changes very slowly (remains fairly some for a large number of pixels) then it’s a region of low frequency.
In the above image, only these two types of regions are in dominance. Hence the filters are capturing mostly high and low-frequency components/features.