Hi,
So, I understand the Triplet loss function and what it is aiming to do. Consider that I want to implement a Siamese Network completely from scratch. By completely, I mean no TensorFlow or PyTorch.
In Triplet Loss, we have a total of 3 predictions involved: f(A), f(P) and f(N). Similar to a normal loss function which takes the true labels and the predicted labels, we need to calculate the derivatives of the loss with respect to each prediction, i.e dL/dfA, dL/dfP and dL/dfN.
As I understand, we have only one neural network with a (say) 128-dimensional dense layer as the final output. My question is how are the three gradients backpropagated to this single layer? A Dense layer has a single output and hence, expects a single gradient to be backpropagated. How do I handle these three gradients?
Note: I’ve looked around and have been unable to find anything concrete. Most of the resources just say “it is backpropagated.” My questions is about the internal mechanics of it all. Also, I know how to calculate the gradients, so no issues with that.
Thanks!