Based on the current logistic unit, if two inputs are the same, then their component values will be similar, resulting in small per-component difference, so the weighted sum will be a small value. And with sigmoid applied, the yhat is near 0. Vice versa for dissimilar inputs. But we have 0 representing “different” faces and 1 representing “same” faces. Why?

Thank you so much for your time!

The sigmoid layer is a Dense layer with 1 unit with sigmoid output activation. We feed in element-wise difference in face embeddings to this layer.

You can label images however you’d like. The NN will adjust weights and bias components to match the label. Just a gentle reminder that \hat{y} stands for prediction and `y`

is the actual label.

Thank you so much! I see. The network might learn a bias of 1 so similar images get 0 + 1 = 1 and different images get -1 + 1 = 0 potentially. The network knows ;D