Face verification / Recognition

hi everyone, one thing I really did not understand is how to prepare the training examples for learning.

  1. do we train several images first using some of the pictures we want to verify in the case of our assignment the inception network, then pass 2 images to the convnet we trained of same parameters and get the encodings, then via back propagation try to reduce the distance if the pictures compared are from same person.
  2. or we can just train a convnet from faces of different people not necessarily of the staff or the employee you are trying to verify then we can now use the pictures of staff or employee and pass them through the convnet and parameters we just learned then get the encodings to reduce the distance.
  3. How do we arrange the data for training triplet loss if we have a data set of 10k pictures and 1k persons as stated by professor.

Hi @Nnaemeka_Nwankwo,

You’ll see the assignment is using a pre-trained model. As mentioned in the assignment, it is one-shot. What it means is that it has been trained on several thousand of pictures and it fully works. It uses pictures of staff member so that it is ethical…in the sense that, if I use your image without permission, you might have an issue with it.

The staff images are just to give you an idea how it works. You can use your own image and have it in the “database” and then test things with another image of you and it will work, hence, one-shot because you are only using one image.

As for training with triplet loss, I believe it is quite difficult to do so on a personal level because you need three images per step, two of the same person and one of a different person, this helps in reducing cost. So you can imagine what this process might look like. Not impossible, but difficult.


  1. Thank you so much @Mubsi from what I understand the pretrained model was trained on several thousand images just like how you train a normal convnet for classification. But instead of outputting if the image is True or False you take the encoding of vectors. that encoding is what is used to verify two images by how similar This encodings are. if I understand clearly.
  2. Using Triplet loss, you have to now prepare the data in a different way in other to reduce the cost through back propagation. or do you use the for example, the pretrained CNN on a large data set of images then you can now use a dataset of anchor, positive and negative to train and then using the triplet loss to learn the parameters and get a better embedding.

Best regards,