Hello
I don’t understand what is the purpose of k.mean in the contrastive loss. I mean, tensorflow will take the mean of the output just like the Huber loss. In Huber loss we didn’t compute the mean before returning. So why we are doing it here?
Hello
I don’t understand what is the purpose of k.mean in the contrastive loss. I mean, tensorflow will take the mean of the output just like the Huber loss. In Huber loss we didn’t compute the mean before returning. So why we are doing it here?
@Kooshan Welcome to our community
Ideally custom loss function should return the loss per example and not the mean (any kind of reduction) over the batch. The reduction of loss over the batch is handled internally by TensorFlow.
However, if you reduce the loss to a single value in the custom loss function, it should work because the internal implementation of TensorFlow to reduce loss won’t raise any error.
You can read the documentation and go through the source code to get more clarity.
Hi Kooshan and Ankit! Thank you for pointing this out. Although both approaches work, it turns out it is better to remove K.mean()
and let the underlying libraries do the reduction. There are pretty long discussions about it here and here (same author) and the resolution is to return one loss per sample. By convention, the docs also show that the last axis can be dropped by taking the mean at axis=-1
. In our case, that just reduces the loss function output shape from (batch_size,1)
to (batch_size,)
. That is inconsequential in our output and tensorflow is able to sum over the losses and get the mean. So we can skip that part and just return the contrastive loss as written in the formula in class.
The notebook has now been updated in the classroom. Thanks again!