In the last assignment of the CNN course, for the style transfer part, there is written the following :
" πΊ(ππππ)ππ : correlation The result is a matrix of dimension (ππΆ,ππΆ), where ππΆ is the number of filters (channels). The value πΊ(ππππ)π,π measures how similar the activations of filter π are to the activations of filter π
πΊ(ππππ),ππ : prevalence of patterns or textures. The diagonal elements πΊ(ππππ)ππ measure how βactiveβ a filter π is. For example, suppose filter π is detecting vertical textures in the image. Then πΊ(ππππ)ππ measures how common vertical textures are in the image as a whole.
If πΊ(ππππ)ππ is large, this means that the image has a lot of vertical texture.
By capturing the prevalence of different types of features (πΊ(ππππ)ππ), as well as how much different features occur together (πΊ(ππππ)ππ), the Style matrix πΊππππ measures the style of an image.*
"
I do not get it I am confused : What do they mean with πΊ(ππππ),ππ and πΊ(ππππ),ππ ? What do the lines and the rows represent in this final gram matrix ? I understand that we want to see the correlations between the filters (that contain the features/activations). But why are πΊ(ππππ),ππ for correlations and πΊ(ππππ),ππ for the prevalence of textures/patterns ?
Your description is exactly backward from what they said in the text. The point is that the Gram Matrix is a form of βcorrelationβ matrix between the various filters (channels). So the correlations between a given filter i and different filters are the βoff diagonalβ elements with j \neq i. The correlation of the given filter with itself just gives you the squared norm of that weight vector. Iβm not sure I understand what they mean by the magnitude of that squared norm being an indication of strong patterns or textures. If the weights learned are larger in a given filter, does that mean the pattern it is detecting is stronger or weaker (needs more of a boost from the weights to be recognized)? Or maybe the fact that there are a lot of elements that are not near zero means that interesting things are happening in multiple features in the input all at once. Not sure of the intuition there, but maybe someone more knowledgable will also notice this thread and comment.
In short, a style can be roughly summarized as having a set of patterns in the same spatial windows across an image. Some cases are: caligraphers have their own distinct strokes, painters perfer certain patterns in specific colors.