In lesson #4 the lecturer nicely presented the “original” form of the contrastive loss function
\mathcal{L} = \sum_{ij} \left[
y_{ij} \cdot \left(
1 - \textrm{sim}(u_i, v_j)
\right)^2 +\
(1 - y_{ij}) \cdot
\max\left(0,\; \textrm{sim}(u_i, v_j) - m \right)^2
\right]
But the actual loss to be minimized is based on the cross–entropy
\frac{1}{N} \sum_{i=1}^{N} \log \left(
\frac{\exp{S_{ii}}}{\sum_{j=1}^N \exp(S_{ij})}
\right)
I understand that these two different functions are possibly surrogates to each other as they both minimise the distance to the identity matrix. But I am not sure how to choose from the two. I think the CLIP paper also used the cross–entropy based loss.
Is it true that the cross–entropy based function is generally more widely applied recently?