Why Gram matrixes give a good sense of style?

Intuitively, Style is the correlation between features, and as Andrew explains, we may observe orange lines etc. What I don’t understand is the following:

i) Why are we looking for a correlation between channels? Shouldn’t we be looking for similarity in features inside a channel. That is, a similarity in pixel patterns in one channel? I can understand how that is difficult but why are we comparing channels?


Here in the grid a the top of the image, he talks about orange and orange lines, but these are correlations inside one channel and not across multiple channels?

ii) If I assume, I understand after someone kind explains why channel correlation gives a sense of style, my second question is, Why does the product of 2 corresponding channel activations and and then their sum give a metric for their similarity?
If activations are vastly diffirent or similiar, they will still sum up to very large numbers for both Content matrix and generated matrix?

Hi Jaskerat,

Each layer in the channel extracts a particular feature. So to find correlations between features, values in different channels have to be compared.
See, e.g., this post.

As to your second question, if activations are similar, you will get a high value. If they are different they will more or less cancel each other out. In this, normalization helps to limit absolute values.

AAAAAH, Okay it just clicked. Thanks for the link to the post. So basically, each filter extracts a particular feature. And each filter creates a channel. So each channel is actually an extracted feature set. Is that correct?

Each filter serves to extract a feature. It does so by means of a computation using the parameter values that are calibrated. The resulting values are stacked per feature in the channels. So the values per channel indicate the presence of a particular feature. One channel may indicate the color orange, another channel may indicate circles. If a style image has orange circles, this will lead to a high correlation value between these channels. If a style image has blue circles, it will not. In that case, a high correlation value will exist between another channel that may indicate the color blue with the channel that indicates circles.

1 Like

Yes, it all makes sense now. This was bothering me, and I get it now, it feels so good, haha. Thanks!