How Gram matrix represents style of an image

For the Neural Style transfer algorithm, the style cost is modeled as the square of the difference between the gram matrices of convolution output and generated image. From the definition, it is the inner product of the vectors of two different channels. But I don’t understand why gram matrix can be used as the representation of the style?

Lawerence offers some explanation about it at style loss and the optional video which I think should suffice, he also gives reference to a paper there you might want to read.

Yes, I watched the gram matrix video, but it only explained HOW to calculate the matrix. And the original paper also explained HOW to use gram matrix to calculate style cost. But my question is more about WHY? What is the property that makes it a good representation of styles? Why we choose it over simply calculating cost by the square difference of convolution?

Prof Ng must have mentioned somewhere in there (at least in passing) or in the notebook that the Gram matrix is just a way to compute something very closely related to the covariance matrix, so it gives you a measure of the degree of correlation vs linear independence of different features of the input. A google finds the Wikipedia article on Gram matrices.

2 Likes

Thanks for your explanation. I also found this paper helpful.

1 Like

Great! Thanks for the link to the paper. I have only read the Abstract so far, but it sounds both interesting and highly relevant to your question!