I would like to point out that the last couple of layers of a Siamese network seem to be trainable. Citation-- paper given in lecture Specific quote:
“The network has roughly the same number of parameters as the original one, since much of it is shared between the two replicas, but requires twice the computation. Notice that in order to prevent overfitting on the face verification task, we enable training for only the two topmost layers. The Siamese network’s induced distance is: d(f1 f2) = i if1[i] f2[i] , where i are trainable parameters. The parameters of the Siamese network are trained by standard cross entropy loss and backpropagation of the error.”
Style cost function citation from lecture
The style cost function is defined as the Frobenius norm of the correlations between layers in the generated and style images. That can be rephrased as, basically, the inner product of the difference between the style and generated image correlations between layers, which are the inner products of the pixels for each layer times the outer product of the layers. I think there are two sentences on this form that represent that to one degree or another.