About Content cost function of Neural Style Transfer

Hi there,

It turns out that the J_G = alpha * J_Content(C, G) + beta * J_Style(S, G). So the goal is to get those losses of J_Content and J_Style which shall be gradient tuned into smaller.

My fool and the naive question is, in the lecture or official paper we perform the J_Content with

However in Course 5, I learn the Cosine functions min, why don’t we use this?


Hey @Chris.X,

I’m not sure I understand your question. Could you expand a bit to see if we can help answer it?


Seems like OP is asking about cosine similarity versus euclidean distance, and why you might prefer one over the other when minimizing loss. To the best of my knowledge, there is no single right answer. Like a lot of things in machine learning, the best choice is the one that works best, meaning serves the needs and meets the constraints of a particular project or business objective.

One thing to consider is what the underlying math can tell you. Cosine compares directions but not distance. Euclidean distance tells you how far apart things are, but not where they are. Which to use depends on what you want to measure/minimize.

@neurogeek is the NLP expert, but my thought would be to use sentiment analysis as a thought experiment. Two statements word vectors might have small Euclidean distance but express opposite sentiment, so Cosine would be a preferred measure.

1 Like

Oh, I see it now. Thanks @ai_curious for such a nice explanation.

Yes, I think cosine similarity is most suited for problems where the two features’ (ie. vectors’) magnitude are not important but the ‘direction’ matters most. As @ai_curious said, this is particularly useful on things like word embeddings/meanings where the sentiments and relationships between words are super important.

For this particular question I agree that testing different approaches is, in and out of itself, interesting and can lead to interesting behaviors in training/prediction time. Without experimenting on this, I would expect that cosine similarity wouldn’t be very effective in this case, since the distance between the vectors matter here. Also, establishing the similarity target between content and style would be hard. This is just a guess.

Having said the above, it would be interesting to find a way to get this to work with cosine similarity.

Thanks for the awesome discussion!

I agree. Distance is exactly what matters in the style cost function…if you subtract the two pixel values, how large is the result. Euclidean distance makes sense. Important also to realize/remember that least squares and cosine aren’t the only two similarity metrics out there. @Chris.X, My recommendation is to be flexible and pragmatic about which tool is best for the specific job at hand, which may require experiments.

1 Like