Question about C3_W4_lecture_nb_2_Modified_Triplet_Loss

Hi,
I have a question about hard negative mining. In the C3_W4_lecture_nb_2_Modified_Triplet_Loss notebook (section Hard Negative Mining) when we are looking for the closest negatives we exclude negative examples if their similarity scores greater than similarity scores of positive examples:
mask_2 = sim_an > sim_ap.reshape(b, 1) # mask to exclude sim_an > sim_ap
My question is why do we need to do this step? Isn’t it better to penalize model for making incorrect predictions in such cases?

Hey @conscell,

I believe this is a choice for designing the loss function. As you can see in the lecture video: “Computing The Cost II”, the instructor has discussed 2 of the many possible choices for designing a loss function. If you think that this can be better, please feel free to add another loss component L_3, to the final loss function L, and train your model with that. Also, you can replace any of the existing loss components with your proposed component, to see if that gives even a better performance.

Please do share your results with the community.

Cheers,
Elemento

Hi @Elemento,
thank you for the reply. Here is the experiment results.

Vanilla model

Step    100: Ran 100 train steps in 73.06 secs
Step    100: train TripletLoss |  126.74811554
Step    100: eval  TripletLoss |  125.67736816

Step    200: Ran 100 train steps in 67.87 secs
Step    200: train TripletLoss |  90.45353699
Step    200: eval  TripletLoss |  68.92859650

Step    300: Ran 100 train steps in 67.45 secs
Step    300: train TripletLoss |  66.81349945
Step    300: eval  TripletLoss |  65.63706970

Step    400: Ran 100 train steps in 73.49 secs
Step    400: train TripletLoss |  61.36888504
Step    400: eval  TripletLoss |  53.00606155

Step    500: Ran 100 train steps in 63.93 secs
Step    500: train TripletLoss |  39.23133087
Step    500: eval  TripletLoss |  29.35581207

Step    600: Ran 100 train steps in 68.10 secs
Step    600: train TripletLoss |  24.69686699
Step    600: eval  TripletLoss |  21.20908737

Step    700: Ran 100 train steps in 70.11 secs
Step    700: train TripletLoss |  19.57789612
Step    700: eval  TripletLoss |  17.75962067

Step    800: Ran 100 train steps in 66.62 secs
Step    800: train TripletLoss |  14.32200718
Step    800: eval  TripletLoss |  16.50632858

Step    900: Ran 100 train steps in 62.45 secs
Step    900: train TripletLoss |  13.53366470
Step    900: eval  TripletLoss |  15.61433220

Step   1000: Ran 100 train steps in 68.16 secs
Step   1000: train TripletLoss |  12.49942112
Step   1000: eval  TripletLoss |  10.93266869

Step   1100: Ran 100 train steps in 69.39 secs
Step   1100: train TripletLoss |  10.57346821
Step   1100: eval  TripletLoss |  12.41757393

Step   1200: Ran 100 train steps in 67.96 secs
Step   1200: train TripletLoss |  8.96909428
Step   1200: eval  TripletLoss |  12.09357929

Step   1300: Ran 100 train steps in 64.98 secs
Step   1300: train TripletLoss |  9.13851738
Step   1300: eval  TripletLoss |  12.08344841

Step   1400: Ran 100 train steps in 70.56 secs
Step   1400: train TripletLoss |  8.71969223
Step   1400: eval  TripletLoss |  11.81708241
Accuracy 0.73935544

Modified model

Step    100: Ran 100 train steps in 75.86 secs
Step    100: train TripletLoss |  127.86864471
Step    100: eval  TripletLoss |  127.99868011

Step    200: Ran 100 train steps in 67.72 secs
Step    200: train TripletLoss |  127.97861481
Step    200: eval  TripletLoss |  127.98826599

Step    300: Ran 100 train steps in 67.42 secs
Step    300: train TripletLoss |  126.69123077
Step    300: eval  TripletLoss |  104.46441650

Step    400: Ran 100 train steps in 73.88 secs
Step    400: train TripletLoss |  90.05741882
Step    400: eval  TripletLoss |  78.65382385

Step    500: Ran 100 train steps in 64.12 secs
Step    500: train TripletLoss |  75.82817078
Step    500: eval  TripletLoss |  71.53275299

Step    600: Ran 100 train steps in 68.32 secs
Step    600: train TripletLoss |  68.21308899
Step    600: eval  TripletLoss |  63.57521820

Step    700: Ran 100 train steps in 69.77 secs
Step    700: train TripletLoss |  50.10398102
Step    700: eval  TripletLoss |  46.19656372

Step    800: Ran 100 train steps in 65.46 secs
Step    800: train TripletLoss |  33.77754211
Step    800: eval  TripletLoss |  32.35847473

Step    900: Ran 100 train steps in 71.04 secs
Step    900: train TripletLoss |  28.29660606
Step    900: eval  TripletLoss |  28.22137642

Step   1000: Ran 100 train steps in 68.11 secs
Step   1000: train TripletLoss |  25.06066895
Step   1000: eval  TripletLoss |  23.22163582

Step   1100: Ran 100 train steps in 69.27 secs
Step   1100: train TripletLoss |  20.36307907
Step   1100: eval  TripletLoss |  23.35410309

Step   1200: Ran 100 train steps in 68.30 secs
Step   1200: train TripletLoss |  17.93107605
Step   1200: eval  TripletLoss |  21.29296684

Step   1300: Ran 100 train steps in 67.07 secs
Step   1300: train TripletLoss |  17.63393784
Step   1300: eval  TripletLoss |  22.01875114

Step   1400: Ran 100 train steps in 68.94 secs
Step   1400: train TripletLoss |  16.65108109
Step   1400: eval  TripletLoss |  19.93952370
Accuracy 0.73964846

After step 1400 both models started overfitting. Vanilla model has better convergence, however it is hard to tell which one performs better on the test set.

Hey @conscell,
Thanks for sharing your experimental results with the community.

Perhaps, we can restrict the vanilla model to run for 800-900 steps only. In that case, the vanilla model will give roughly the same convergence losses as the modified model. And if both the models give the same test accuracy in that scenario, then we can say that the vanilla model has faster convergence rate for the same accuracy.

Cheers,
Elemento

Hi @Elemento,
Another important factor is different random initialization. I am not sure how to properly set RNG seed in Trax. In order to iterate faster I’ve tried to run things locally on GPU but couldn’t find a working combination of Trax/Jax/Tensorflow compatible with this notebook because unfortunately, Trax development has stopped in 2021. Here is the result of another experiment for 900 steps.

Vanilla model:

Step 900: Ran 100 train steps in 73.79 secs
Step 900: train TripletLoss | 17.79376984
Step 900: eval TripletLoss | 20.02503777
Accuracy 0.73427737

Modified model:

Step 900: Ran 100 train steps in 73.67 secs
Step 900: train TripletLoss | 21.53363609
Step 900: eval TripletLoss | 19.55072975
Accuracy 0.7337891

You are right, looking at the results I can’t confidently state that Vanilla model has better convergence.

Hey @conscell,

As I said previously, it depends on our application. You can try out various losses for your application, and whichever gives you best performing model, you can choose that. There is no hard-and-fast rule that you have to choose one of the loss functions taught in the course.

Once again, thanks again for sharing your experimental results.

Cheers,
Elemento