Why Gram matrixes give a good sense of style?

Hi Jaskerat,

Each layer in the channel extracts a particular feature. So to find correlations between features, values in different channels have to be compared.
See, e.g., this post.

As to your second question, if activations are similar, you will get a high value. If they are different they will more or less cancel each other out. In this, normalization helps to limit absolute values.