Understanding "what conv nets really learning"

Hi. I’d like to have some clarification regarding what is shown in the “what conv nets really learning” video. Specifically:

  1. Is the 9 “maximum” activations relate to the whole training dataset?Or in another words, each patch from the 9 presented belongs to a different input image
  2. When showing the outputs of neuron in deeper layers, how come they look like some portions of the input image? After all, each “deep” neuron recievs data from all original image pixels, thus I would not expect to have a “sensible” visualization of such a neuron output.

Hello @gilad.danini,

Check out from 0:29, or the following part of the transcript:

Here’s what you can do. Let’s start with a hidden unit in layer 1. And suppose you scan through your training sets and find out what are the images or what are the image patches that maximize that unit’s activation. So in other words pause your training set through your neural network, and figure out what is the image that maximizes that particular unit’s activation.

In short, those are images, and not some learnt representation of the images.

Cheers,
Raymond