How to choose from different contrastive loss functions?

In lesson #4 the lecturer nicely presented the “original” form of the contrastive loss function

\mathcal{L} = \sum_{ij} \left[ y_{ij} \cdot \left( 1 - \textrm{sim}(u_i, v_j) \right)^2 +\ (1 - y_{ij}) \cdot \max\left(0,\; \textrm{sim}(u_i, v_j) - m \right)^2 \right]

But the actual loss to be minimized is based on the cross–entropy

\frac{1}{N} \sum_{i=1}^{N} \log \left( \frac{\exp{S_{ii}}}{\sum_{j=1}^N \exp(S_{ij})} \right)

I understand that these two different functions are possibly surrogates to each other as they both minimise the distance to the identity matrix. But I am not sure how to choose from the two. I think the CLIP paper also used the cross–entropy based loss.

Is it true that the cross–entropy based function is generally more widely applied recently?

1 Like