Issues with triplet loss function

Greetings!!
I am attaching the code I have written triplet loss function.
I am getting the wrong value error. I am unable to figure the issue. any advice, kindly let me know. thank you

(Solution code removed, as posting it publicly is against the honour code of this community)

Hi @David00,

First off, it would be great for anyone who’s willing to help out with the query to be specific about where the query is from. You didn’t mention which week or assignment or what the exercise number is. Providing these informations helps us in helping you. (Luckily, there’s only one place in DLS where triplet loss is used so it helped in remembering)

Secondly, posting solution code is against the honour code of this community. If there would ever be a need to look at your solutions, we’ll direct message you ourselves. It is always best to start by sharing the error trace rather than the code itself.

As for the code, your step 3 and step 4 are incorrect.

For step 3, as the instructions say, you have to first subtract the two differences first and then perform addition. And since all of this is being done using TensorFlow functions, don’t just simply do +. Keeping this in mind, if you look at the logic you have used, you’ll see the mistake.

Similarly, follow the instructions of Step 4 as well.

Happy learning,
Mubsi

ok, will be more meticulous in the future.

Look at the actual meaning of the code you wrote. It’s equivalent to:

a - (b + \alpha)

Well, that’s not what the math formula says, right?

I am sorry, I am unable to appreciate the difference. the formula says pos_dist - neg_dist + alpha. In one of the previous courses, Andrew talks about broadcasting in python. In any case, even after using tf.add for + and tf.subtract for -, I am still getting assertion error. I am unable to figure out where the issue is.

a - (b + \alpha) = a - b - \alpha

That’s not what the formula says.

1 Like

oops, I missed the operator precedence. I have been struggling with this problem a long time.

as suggested in the problem, first, using tf.subtract, I have computed the difference between pos_dist and neg_dist, and then added alpha to it using tf.add. computed the loss using tf.reduce_sum, again, as suggested in the problem. I am still getting the assertion error. Idk what I am missing.

That’s only one part of it, right? How about the next step after that? Note that the max of the sum is not the same thing as the sum of the max.

Again, as specified in the problem, using tf.maximum, I have taken the max of basic_loss and 0.0.

That sounds right. And what do you do then? There’s another step after that, which was my previous point that the max of the sum is not the same thing as the sum of the max.

The point being that the basic_loss value that is one of the inputs to that tf.maximum is a vector, right? I’ll bet in your case it’s not. Meaning exactly what I said before: you’re taking the max of the sum, but that’s not what they are asking you to do. It’s the sum of the maxes, right? Look at the formula again.

The point is that all the steps up to and including taking the max with 0.0 is being done “per sample”. Then only at the final stage do you sum the loss values over all the samples. The point of the max with 0.0 is that you discard any values for which the loss on that sample is negative.

Will double check and respond.

And are you sure there is no reduce_sum on the computation where you subtract positive and negative and then add alpha? That’s supposed to produce a vector with one element per sample.

You are right. that was the issue. I had taken the max of the sum. I fixed it now. thank you for your continued guidance.

Glad to hear that you have solved it!

There is a “meta” lesson here: if you think you are saving yourself time by not reading the instructions carefully and making sure you understand what the math says, that is not a net savings of time in general. You save 5 or 10 minutes and then end up wasting hours because you’re implementing something different than what the formula says. It all starts from the math and if you have that wrong, it doesn’t matter how good a programmer you are :scream_cat:

Or maybe the other way to state the “meta” lesson is that there are two ways you can end up with the wrong answer:

  1. Your code doesn’t really do what you intended.
  2. Your code does what you intended, but your intention is wrong.

If you’ve invested a bunch of energy on the assumption that the problem is type 1), then maybe after an hour or so of that, it’s time to take a step back and consider that maybe it’s really a type 2) problem.

1 Like

I totally agree. Will be more careful. Thank you!

Understood. Will follow this advice. I got confused by the various tensorflow functions. I suppose I need to read their documentation to understand what they are doing.