Triplet Loss - why bother maximizing with 0

Matthi · January 22, 2025, 7:48pm

Hi everyone,

when watching the video about the triplet loss for face recognition, Andrew explained that we maximize the loss with 0 because we dont care how much we differ as long as its bigger than the margin. I was wondering: Wouldnt it be beneficial for training to let the loss become negative in case the model performs very good? To me it looks that we are rather “incentivizing” the model to only perform as good as needed here.

Deepti_Prasad · January 22, 2025, 11:13pm

hi @Matthi

optimizing model to minimize the triplet loss ensures that the distance between our anchor and negative representations is at least margin =“alpha” higher than the distance between our anchor and positive representations. This allows us to learn an embedding space where our anchor and positive representations are close and the anchor and negative representations are farther apart.

So while computing the gradient, updating the metric and return loss as negative can cause anchor-negative distance to be too large causing exploding gradient, so choosing a margin of 0 optimises the triplet loss function there by preventing the update metrics within optimal limits.

rmwkwok · January 24, 2025, 2:01pm

Intuitively speaking, I think the job of a classification task is to define a classification boundary that separates two classes. In other words, as long as the P and N samples are differed by the margin, then, to this good pair, the boundary is clear and the job is done! However, if we further push the good pairs apart, would we then create more bad pairs or would we fail to reduce more bad pairs?

Yes - we are incentivizing to only perform sufficiently good, but we are also asking it to focus on bad pairs (aka semi-hard and hard triplets) by ignoring the good ones (aka easy negatives).

If you have read about “imbalanced dataset”, you know that we might want to, for example, upsample the minor class to avoid the network from biasing towards the major one. Here, with triplet loss, we are taking out those good pairs so, effectively, we are “upsampling” the bad pairs such that the training can focus on them.

Just an intuitive discussion.

Topic		Replies	Views
Week 4 Triplet loss Convolutional Neural Networks coursera-platform	11	676	October 28, 2021
Loss function in triplet loss Convolutional Neural Networks coursera-platform	4	761	October 1, 2021
Triplet Loss function for the Siamese network Convolutional Neural Networks coursera-platform	4	636	July 22, 2022
Training Data set for Triplet Loss Convolutional Neural Networks coursera-platform	1	508	June 15, 2022
Question about C3_W4_lecture_nb_2_Modified_Triplet_Loss NLP with Sequence Models week-module-4	5	455	July 16, 2023

Triplet Loss - why bother maximizing with 0

Related topics