DLS4, Week 4, Siamese Network, general understanding


  1. I am a bit confused on how to apply the siamese network. Prof. Ng says in one of the videos that there are several models available which have already been trained on millions of people. If I like to use that at my own company then, do I then just take that model, input a (one?) picture of each employee, and get an “encoding vector” (the 128-vector in the videos) for each employee? 1000 employees gives 1000 “encoding vectors”? And comparison has to be made to all of these 1000 vectors each time an employee whats to pass the recognition system? Correct?

  2. And, as the original model of +1e6 pictures is trained, surely some sort of one hot y-vector must be used, holding the ground truth over which persons are actually fed to the network? I got a bit confused when Prof. Ng just drew a line over the softmax.


Yes, that all sounds correct.

  1. Prof Ng describes in the lectures several different types of systems you can build based on that type of pretrained system which computes encodings of faces (sometimes called “embeddings”). We also get to see that in the notebook. Two cases in the assignment are the 1:1 identification problem: given a person with a picture ID, does the person match the picture on the ID? Or if you want to use it to identify employees from their faces as they enter the building, then you’d need to build a database of the encodings as you said. That is the second exercise in the assignment (the who_is_it function).

  2. When they originally trained the network, they would have needed to know the identities corresponding to all the pictures. That would probably have been encoded with a categorical labelling for storage purposes (single number as an index into a list of names), but probably converted to “one hot” representation during training for computational efficiency.

If you want to know more, it might not be a bad idea to watch the lectures again. Prof Ng does a way better job of explaining this at the whiteboard than I can hope to do by typing. If you want to go deeper still, there are references to the FaceNet and DeepFace papers in the Assignment notebook. See the References section at the end.


Thanks for your answer @paulinpaloalto!