# A doubt in visualizing deep network

Dear Mentor,

Could you please guide me this issue?

“if you pick one hidden unit and find the nine input images that maximizes that unit’s activation, you might find nine image patches like this”, quoted from lecture

Given 2 input images, how should i find the 1 input image, its image patch that maximizes unit’s activation? Is there any formula?

Thank you.

1 Like

I don’t think they described how this was implemented in the lecture. Are you just curious as to how this is done?

There might be a formula, but I think you can implement this pretty easily in the code. You can simply just loop over all the samples you are given, forward propagate each one until you reach the hidden unit, and then keep track of the sample indices of the largest activations for that hidden unit.

1 Like

Dear Mr Hackyon,

Could you please guide me whether my concept is correct?

In this case, the largest activation for this hidden unit is activation a because

sum of activation a = 0+0+0+0+30+30+30+30+30+30+30+30+0+0+0+0 = 240

compared to

sum of activation b = 0+0+0+0-30-30-30-30-30-30-30-30+0+0+0+0 = -240

Thank you.

I am confused. What is the value in summing the activations?

Dear Mr Tom Mosher,

*The calculation above is just a rough intuition (i don’t know whether it is valid) on comparing 2 activations.

Could you please guide me on how to identify whether one’s activation is larger than another?

Thank you.

1 Like

I’ll return to the question, why do you need to compute this?

Hi @JJaassoonn,

You computed the outputs of a and b

What will the outputs be if you replace the above filter with the horizontal as defined below? Let’s call the new outputs as c and d.

With a, b, c, and d in hand,

• which two of them are maximally activated?
• which two of them are least activated?

Reason your answers not just with the numbers, but also what those filters do - you know the meaning of the filters, and you know what the inputs visually look like. We can’t ignore the meaning.

Cheers,
Raymond

1 Like

After that, can you explain why there is the negative sign? Please take the visual of the input and the meaning of the filter into account for your explanation. What is the difference between the two inputs that could contribute to the negative sign even though they are processed by the same filter? How can you change the filter so that a becomes -240 and b becomes +240 (switching of signs)?

1 Like

Previously, you measured the activation of a unit with respect to an input by the above formula.

My last question is, could you propose another measurement that can correctly inform, out of a, b, c, and d, which two should be maximally activated and which two minimally?

1 Like

Dear Mr Tom Mosher,

To visualize 9 image patches that maximize the hidden unit’s activation. (one image patch is extracted from one image, choosing 9 images in dataset that maximize the hidden unit’s activation), but i don’t know how to define an activation which is more “highly activated” than another.

Thank you.

1 Like

Dear Mr Raymond,

I have plotted out a, b, c, d (Each of which represents a convolution) as follows:

In my opinion,

a is maximally activated, because it detected an edge by using the hidden unit (filter of bright-left-dark-right).

b is second maximally activated, because it detected an edge in a reversed transition by using the hidden unit (filter of bright-left-dark-right).

c and d are least activated because nothing is detected.

The negative signs are shown in the activation b because the input image is formed in the way of dark-bright transition, the filter is trying to find an edge in the input image which has bright pixels on its left and dark pixels on its right but discovers that there is an edge in the input image which has dark pixels on its left and bright pixels on its right.

The shade of transition of input image a and input image b are different so that activation b could contribute to the negative sign even though they are processed by the same filter

Activation b can be positive signs when the filter is changed by flipping its position as follows:

That is my doubt, i have no idea on how to measure in calculation on which one is maximally activated and which two minimally.

I only can judge it by rough intuition.

Please guide me if there is any mistake.

In addition, i have another doubt

The yellow-highlighted part in the activation represents the edge extracted from the input image. May i know what does the grey-highlighted part in the activation represent? I thought that represents the background of the edge in the input image but it seems not.

Thank you

In your example, I think that you are right in that both the 30 and -30 values should be considered “highly activated” for that particular ConvNet layer. With that said, this is a contrived example. In practice, the “ReLu” function is usually used for activation, and so the learned filters would output large positive values rather than negative values.

However, I don’t think it’s correct to sum up all the values. It’s more correct to pick the max value (or top N max values) in that ConvNet output, and then figure out which input values (or “patch”) contributed to that max value. There are many max values (30) in your case, but I think the max values are less likely to be duplicated in practice.

Specifically, in the given example with 30 as max values, the following 2 inputs/patches would result in a “highly activated” value:

``````[ [ 10 10  0 ]
[ 10 10  0 ]
[ 10 10  0 ] ]

[ [ 10  0  0 ]
[ 10  0  0 ]
[ 10  0  0 ] ]
``````

If you have multiple ConvNet layers, you can keep doing this operation backwards to figure out which patch in the original images resulted in the highly activated values (30 in your example).

I found the paper that talks about how this is done in practice. There’s also an online book that I think does a pretty good job explaining this.

I gave the papers a quick read, and the basic idea seems to be to use a “DeConvNet” that reverses the operation of a ConvNet (just do the operation, but backwards), while also keeping track of the mapping of outputs/inputs that results in the highest activation value.

1 Like

Dear Mr Hackyon,

Thank you so much for your guidance and the recommended study materials.

1 Like

Hi @JJaassoonn,

Thank you for the detailed analysis!!

Since our mentor @hackyon have shared those great references, I will try to align the following discussion of one of your questions to that.

The output on the right hand side is called a feature map, and each number in it represents the detection of such feature - vertical edge in this case. Therefore, I think the zeros, or the grey area, simply means they don’t detect the vertical edge (that the filter means to detect).

The method in those references will reverse the convolutioned features into pixels. If we think about this process with the zeros you questioned about, it is going to be a good guess that those zeros, when reversed back to pixels, are not going to express any features in result, but only those 30s will.

In fact, if we look just at your feature map on the right hand side, we can tell there is a vertical edge in the center, although we should be careful probably not to make comment like this when it comes to examine a middle layer’s feature map in a multilayer convnet.

Cheers,
Raymond

1 Like

Dear Mr Raymond,

Thank you so much for your tutorial. I found that very useful as it always make me think outside the box.

1 Like

I am glad to hear that

Cheers,
Raymond