From the Triplet Loss video lecture, there is a statement like , we need to contruct the training set using triplet form of data where each person image stored as average 10 different images. After trained system on constructed triplet form of training data sets, proff told like we can then apply to face recognition system.
Out doubt is After trained system how we can apply to face recognition system ? because in the case of face recognition system we do have only one picture of employee right stored in database.and we are not going to collect 10 different images for each employee in our database . so How trained system can be applied to one shot learning problem ?
Because the system has learned to compute a representation of any image. You can compute the distance between the database images of your employees that you have stored (one shot) and a newly taken photo to be used for face recognition, for example. What you have trained is a system where two pictures of the same person have close distance, and two pictures of similar, but different persons, have a larger distance.
Just to make sure I got it , So after we trained our model , if I give it a photo , the model will compute the distance between the given photo and all the images in the database and then choose the lowest cost of distance function and output the name of the employee
Right ?
Not quite. What the trained model computes is the “embedding” of the input image into the space of “facial feature representation”. That’s the 128 element vector that is output by the model. What you then need to do with that information depends on what you are trying to accomplish. If the goal is to recognize whether the person in front of the camera is one of the employees in your database, then you need to write the logic to compute the distance between the given embedding and all such entries in your database and see if any of them meet the threshold for “close enough”.