Week 4 Neural style transfer

Can someone please explain what exactly is a hidden unit in a conv layer?


As in the screenshot , in the 1st layer, are there 96 hidden units or (110 x 110 x 96) hidden units?..
Also, Andrew Ng writes " to find the nine image patches" here…what’s that supposed to mean?

The hidden layers are all the internal layers of the network other than the output layer. In a ConvNet, it’s the same as in a Feed Forward net: a “hidden unit” is just one of the output neurons in a given hidden layer. It’s just that in a ConvNet, the way those are represented is a bit more complicated in that they are a 3D tensor instead of just a vector. The point of the cool research that Prof Ng is explaining in this lecture is that they had the idea of “instrumenting” various hidden units (output neurons) in internal layers of a ConvNet and then tracking what inputs triggered the maximum output for a given neuron after the network has been trained. So they add logic to look at the outputs of some subset of the neurons and track the output values as they feed batches of inputs through the network in “inference” or “prediction” mode (not training). That shows you what that given neuron has learned to detect through the training process: the patterns that it “recognizes” most strongly, meaning that they trigger the highest output values. That’s very cool because it gives you a way to visualize what the network has actually learned and gives you a bit more insight into how and why it works. Of course you can move the points of observation around and further improve your insight. I have not actually read the paper that he is talking about here, so it might be worth a look if you want to get a more concrete picture of how they arrived at the final results here. E.g. how did they pick the particular neurons to instrument?

1 Like

Thanks for the reply Paulo…so if am understanding this correctly, there are actually 110x110x96 hidden units in the first conv layer right?

Yes, that’s correct. But presumably they are sampling from the deeper hidden layers for the results that they show. Prof Ng discusses this in the lecture I’m sure although it’s been a while since I have watched it.

1 Like

Thanks for clarifying all this Paulin…u r awesome!!

I wanted to follow up on this topic as I had a similar issue understanding the term “hidden unit” in this context, and actually ended up asking ChatGPT for clarification :slight_smile:

I’ve tried to capture my understanding below and would appreciate any confirmation or corrections, I realise some of the following may be incorrect.

In my understanding, each layer of a CNN has a set of filters, each of which is of size f x f x c and has learned weights.

The layer 1 volume shape (110 x 110 x 96) corresponds to the activations of layer 1, after the convolution has been applied, and each of the 96 channels corresponds to the activation of one trained filter. I don’t think we know the size of the filters that were trained, but for the purposes of this example let’s assume 8x8. The change in size from 224 to 110 is a result of a stride of 2, I think.

So layer one has 96 filters of 8x8x3, each with it’s own trained weights.

Applying these filters to an input image will result in the 96 channels of the output volume, one channel per filter, where each channel contains the features that have been detected in the input image for a single filter.

If the above is correct, then I think that “hidden unit” refers to the weights of one filter which is essentially a single neuron that is activated by particular arrangements of pixels across the three colour channels.

What’s unclear to me how that intuition might translate into the later layers in terms of the visualisation process shown. I suspect that there is some clever “unwinding” going on to recover the image patches that are shown for the later layers, because the filters for those layers will be detecting complex objects based on features detected in earlier layers rather than arrangements of pixels.

I hope that a) makes sense, b) helps. Also as mentioned above I’d appreciate corrections / clarifications from the course tutors :slight_smile:

Thanks for such an awesome course BTW!

Hi @James_Siddle,

I personally like to think of one filter (f x f x input_number_of_channels) as one neuron, because if I have 23 neurons in this Dense layer, then the layer will produce 23 features; similarly, if I have 23 filters in this Conv layer, then the layer will produce 23 feature maps.

So, Dense → Conv; neurons → filters; features → feature maps.

I know it is not the norm that people actually call a filter as a neuron, but I like to think of it that way.

Yes, and our mentor @hackyon has shared a paper in this post, which I think you may want to have a look.

Thanks for your amazing analysis.

Cheers,
Raymond