I am a bit confused when choosing the triplets to calculate for the cost function.

Why can’t we choose the pair where s(A,N) > s(A,P) as the closest negative?

Good question. I think we are excluding these negatives according to “Reading: Triplets” section where it states that Hard negative triplet: cos(A,P) < cos(A,N).

Later in the course when we calculate Modified Triplet Loss we use this loss (closest_neg) equaly weighted with mean_neg to get the gradients for our RNN weights’ update.

So to the point why we update our RNN weights with respect to the closest negative example **but only under** the cos(A,P) value - we want to update with respect to “hard” example which must be closest but under the Anchor value - this way we update our RNN in a way that during next step we will have these vectors further apart.

It is easier to explain with numbers, for example (when alpha=0.25):

s0(A,P) = **0.11** , s1(A,N1)= **0.12** , s2(A,N2)= **0.08** , s3(A,N3)=0.4, s4(A,N4)= -0.9

L1 = (-0.11 + (-0.075) + 0.25) = 0.065, L2 = -0.11 + 0.08 + 0.25 = 0.22, so L = 0.285

For illustration purposes let’s assume that after gradient descent the same examples would have these values:

s0(A,P) = **0.13** , s1(A,N1)= **0.119** , s2(A,N2)= **0.05** , s3(A,N3)=0.039, s4(A,N4)=-0.901

So after this step we would start to push the positive and N1 further apart, not the N2 as in previous step.

In other words we try to push the hardest negative example from the positive while trying to maintain balance: mean_neg loss pushes all negative examples away from positive, and closest_neg pushes away only closest “hard” negative from positive.

I guess real world scenarios would tell you if you should or not exclude these negatives but I guess this was the motivation - to try to go for the closest negative example - “hard negative” which must be under the s(A,P) value.