Style function gram matrix : correlation vs. prevalence texture and patterns

In the last assignment of the CNN course, for the style transfer part, there is written the following :

"
𝐺(π‘”π‘Ÿπ‘Žπ‘š)𝑖𝑗 : correlation
The result is a matrix of dimension (𝑛𝐢,𝑛𝐢), where 𝑛𝐢 is the number of filters (channels). The value 𝐺(π‘”π‘Ÿπ‘Žπ‘š)𝑖,𝑗 measures how similar the activations of filter 𝑖 are to the activations of filter 𝑗

𝐺(π‘”π‘Ÿπ‘Žπ‘š),𝑖𝑖 : prevalence of patterns or textures.
The diagonal elements 𝐺(π‘”π‘Ÿπ‘Žπ‘š)𝑖𝑖 measure how β€œactive” a filter 𝑖 is.
For example, suppose filter 𝑖 is detecting vertical textures in the image. Then 𝐺(π‘”π‘Ÿπ‘Žπ‘š)𝑖𝑖 measures how common vertical textures are in the image as a whole.

If 𝐺(π‘”π‘Ÿπ‘Žπ‘š)𝑖𝑖 is large, this means that the image has a lot of vertical texture.
By capturing the prevalence of different types of features (𝐺(π‘”π‘Ÿπ‘Žπ‘š)𝑖𝑖), as well as how much different features occur together (𝐺(π‘”π‘Ÿπ‘Žπ‘š)𝑖𝑗), the Style matrix πΊπ‘”π‘Ÿπ‘Žπ‘š measures the style of an image.*
"

I do not get it I am confused : What do they mean with 𝐺(π‘”π‘Ÿπ‘Žπ‘š),𝑖𝑖 and 𝐺(π‘”π‘Ÿπ‘Žπ‘š),𝑖𝑗 ? What do the lines and the rows represent in this final gram matrix ? I understand that we want to see the correlations between the filters (that contain the features/activations). But why are 𝐺(π‘”π‘Ÿπ‘Žπ‘š),𝑖𝑖 for correlations and 𝐺(π‘”π‘Ÿπ‘Žπ‘š),𝑖𝑗 for the prevalence of textures/patterns ?

Thank you all

Your description is exactly backward from what they said in the text. The point is that the Gram Matrix is a form of β€œcorrelation” matrix between the various filters (channels). So the correlations between a given filter i and different filters are the β€œoff diagonal” elements with j \neq i. The correlation of the given filter with itself just gives you the squared norm of that weight vector. I’m not sure I understand what they mean by the magnitude of that squared norm being an indication of strong patterns or textures. If the weights learned are larger in a given filter, does that mean the pattern it is detecting is stronger or weaker (needs more of a boost from the weights to be recognized)? Or maybe the fact that there are a lot of elements that are not near zero means that interesting things are happening in multiple features in the input all at once. Not sure of the intuition there, but maybe someone more knowledgable will also notice this thread and comment.

Thank you.
Yes I hope because this last part is not clear to me.

For those who are still curious, this question is about how similarity can be measured by the Gram Matrix. I wrote a blog post about this and I hope it could help.

In short, a style can be roughly summarized as having a set of patterns in the same spatial windows across an image. Some cases are: caligraphers have their own distinct strokes, painters perfer certain patterns in specific colors.

1 Like