C4W4 Siamese network

Hi there!

I have a question regarding the Siamese network. The assignment was doing sort of a one shot learning model, where there is only a picture tied to a person. What if I have more than just one photo? I imagine training the model to be the same. However, how do I verify the newly captured photo against the few photos of this particular person? I got some inkling and I wanted to check if Im on the right path.

  1. Find the distance against all the photos and get the max distance and if the distance is under a threshold, then the photo matches

  2. Find distance against all the photos, but this time use the mean of all the distance instead.

Can anyone give me a suggestion? Thank you!

Hey @calvin98,
I haven’t really given this a thought till now, but both of your approaches seem to be pretty good to me. Additionally, I guess you can fit both the approaches in a single framework by introducing a simple parameter, which will help you get a sense of confidence. Let’s understand it with the help of an example.

In this example, we will consider that for each person, you have at least n images in the database. Now, let’s call the new parameter conf, which helps us get a sense of confidence. Let’s say that conf = 0.8. This means that for this application, we want the input image to at least match with 80% of the n images stored in the database for any particular person.

In applications, where the security is of too much concern, we can set conf = 1, which will mean that we want the input image to match with all the n images stored in the database, and when we will set conf = 0.5, it will mean that we want the input image to match with at least 50% of the n images. Note that here I am drawing a vague similarity between conf = 0.5 and your second approach.

Another thing that comes to my mind is 1 to 1 mapping, but I guess that is not what you are looking for, still, I would like to state that here, for the sake of completeness. Let’s say that you decide to store 3 images of each person in your database, one from the front, one from the left and one from the right.

Now, at the inference time, you can simply capture images from 3 different angles, front, left and right, and match them with their corresponding images stored in the database. The difference in this case is that not only we have multiple images stored in the database, but we have multiple images at the inference time too. Additionally, for a particular person, any of the input images is only checked against a single image stored in the database, and not against all the images. But as a simple extension, you can simply store multiple images for each angle, for instance, 3 images from front, 3 from left and 3 from right, and then can implement a 1:3 mapping for each of the 3 input images. This might sound a bit confusing, so feel free to leave it. I hope this helps.