NST - the analytic derivation of style cost function

I know it is not required to understand the computation of the derivative of the style cost function in NST network on Week 4 Assignment 2, but can any mentor explain to me how it is computed analytically?

I took linear algebra and differential equations classes recently, but I could not figure out how the analytic derivation in the paper is carried out.(Gatys et al. A Neural Algorithm of Artistic Style. Page 11 - computation of dE[l]/dF[l]_ij)

Hi, Yigit!

When we talk about neural style transfer, we generally use the idea of a pre-trained network (CNN) to build a new task or use it for a novel approach over the same pattern. Now, what happens actually is, we are optimizing a cost function to achieve an artistic pixel value/design, right! This is generally done through building the content cost function and the style cost function. So, by reducing the cost function as per se, the generated image gets most of the (approximate) extracted features of the content image and the style image.
Now, in the given paper in order to generate the same sort of texture of the given original image, the authors cite that the derivatives of the loss w.r.t to the activation(s) in some layer l can be represented as:
and thus, the gradient w.r.t to the original given image with the use of standard error back propagation is calculated in this manner:
using alpha and beta as the weighing factors for content and style representation.
The authors are trying to achieve a unique sense by separating an image content from the style one, and then they are trying to regain the content of one image through the style of another to generate a new image that is based on both the features, but in a new form.