In the " face verification and binary classification" video Andrew says this is an alternative way in face recognition and not using the triplet loss. But to feed into the signoid we still need a Siamese network to compute the encoding for an image. How do we train the network if we are not using triplet loss? We can probably use logistic loss to train (w, b), but how to train the network parameters?
Good point, I was wondering the same thing.
Since there are multiple identity it can’t be simple 0/1 for each photo.
If you found the answer I’m curious about it.
Thanks
I have the same problem. Also what is the training dataset for such network?
Hi @haoyundeng ,
When Andrew Ng discusses “face verification and binary classification,” he’s essentially describing a way to determine whether two images are of the same person or not. The Siamese network, as you correctly pointed out, computes encodings for two images, and then a logistic regression classifier (using sigmoid activation) determines if they’re of the same person.
To train this network:
-
Data Preparation: Prepare pairs of images. For each pair, label them
1
if they are of the same person and0
otherwise. -
Network Architecture: Use a Siamese network to produce encodings for each of the images in a pair. Subtract one encoding from the other, resulting in a difference vector. This difference vector can then be fed into a logistic regression unit.
-
Loss Function: For training, use the logistic loss (or binary crossentropy loss). The loss will compare the output of the logistic regression unit (which is the probability that the images are of the same person) with the actual label (1 or 0).
(L(y,\hat{y}) = -[y \log(\hat{y}) + (1-y) \log(1-\hat{y})])
-
Training: During training, you’ll use backpropagation. Gradients will flow back from the logistic regression unit, through the difference in encodings, and into the Siamese network. This will adjust not just (w) and (b) of the logistic regression unit, but also the parameters of the Siamese network to produce better encodings.
The beauty of this approach is that you don’t need to carefully select triplets, and the training process can be more straightforward. Instead, you focus on direct pairs of images and whether they’re of the same person or not, leveraging the simplicity of binary classification.