For the neural style transfer unit, we define two cost functions, one for computing style and one for the content.

For the content we just use the squared difference between the activations with no consideration of the correlation between filters like we do for the computation of style cost.

Why is style cost dependent on correlation but not content cost?

Your question is totally fair and it is always a good practice as a student to question things.

The algorithm used for the exercise is defining both cost functions. Maybe it is worth to read the paper to have more insight about the reasoning there. What I got from the lectures is:

For the content, you want to measure that given a content image and given a generated image, how similar are they in content. In order to get that you compute how different are these two activations of layer l on these two images. Your goal is to generate an image EQUAL to the content image (not a correlated/similar one) and you optimize the activation parameters to get that.

However for the style, you are not working with the activations directly but with a so-called â€śStyle Matrixâ€ť that is defined at the paper. You compute this matrix for a style image and for the generated image. Your goal here is to generate an image that is close to the style one, but NOT equal as we did for the content! One way to measure how close these matrixes are is by computing the â€ścorrelationâ€ť (though in this case it is closer to the covariance definition).

Hope these explanations help you to understand better the reasoning.