Hi @classical_leap,
I can see your point that that part of the explanation could be confusing. To me, the important aspects of the contrastive loss formula is that:
1). One part of the formula (the D squared part) is for the case where Y=1, and the other (the max(margin-D,0) squared part) is for Y=0, where Y=1 means that we expect the two images to be similar, and Y=0 means we expect the images to be different
2). Our loss should be large if the distance, D, between the two images is far when we expected the images to be similar, and the loss should also be large if we expected the images to dis-similar, but the distance, D, between the two was small.
There’s a little more explanation of this in this old post: C1W2 -> understand the constrastive loss function