couldn’t we use just the ( ||f(a) - f(p)||^2 - ||f(a) - f(n)||^2 ) as loss function instead of using max(||f(a) - f(p)||^2 - ||f(a) - f(n)||^2 + alpha , 0 ) ?

Because one trivial way to make sure this is satisfied is to just learn everything equals zero. Alpha prevents that.

I mean just using ( ||f(a) - f(p)||^2 - ||f(a) - f(n)||^2 ) as loss function:

loss = ||f(a) - f(p)||^2 - ||f(a) - f(n)||^2

learn every thing equal to zero satisfies the function if we set it less than or equal to zero

but here the function won’t be minimized by setting everything equal to zero.

The reason we have a margin and the max operator is that, when the representations produced for a negative pair are distant enough, no efforts are wasted on enlarging that distance, so further training can focus on more difficult pairs.

2 Likes

I got it. thanks for your explanation

1 Like