Hi,

So, I understand the Triplet loss function and what it is aiming to do. Consider that I want to implement a Siamese Network *completely* from scratch. By completely, I mean no TensorFlow or PyTorch.

In Triplet Loss, we have a total of 3 predictions involved: f(A), f(P) and f(N). Similar to a normal loss function which takes the true labels and the predicted labels, we need to calculate the derivatives of the loss with respect to each prediction, i.e dL/dfA, dL/dfP and dL/dfN.

As I understand, we have only one neural network with a (say) 128-dimensional dense layer as the final output. My question is how are the three gradients backpropagated to this single layer? A Dense layer has a single output and hence, expects a single gradient to be backpropagated. How do I handle these three gradients?

Note: Iâ€™ve looked around and have been unable to find anything concrete. Most of the resources just say â€śit is backpropagated.â€ť My questions is about the internal mechanics of it all. Also, I know how to calculate the gradients, so no issues with that.

Thanks!