Questions about face recognition classification models

liuhaoyong · September 20, 2025, 1:35pm

Two questions:

For the final part of this model, can we replace it with vector cosine similarity or the L2 distance formula to compare the similarity between two embeddings? This way, we can utilize the retrieval capability of vector databases for fast retrieval.
There are now many models for converting images to vectors, especially the image embedding capabilities provided by large language models. Can we directly generate vectors based on these capabilities and then perform comparisons?

paulinpaloalto · September 20, 2025, 8:43pm

Note that what is being discussed on that slide is how to train the model that computes the embeddings of the face images. We need to choose the distance and cost metric such that the model does a good job of identifying which faces are the same and which are not. Regardless of what we use as the distance or cost function to drive the training, we’re not talking about just looking things up in a precomputed embedding database at this point. Professor Ng explains in the lectures why the Triplet Loss Function is the preferred method for training such a model.

There are some Face Recognition applications in which you do have a precomputed database of face embeddings, as Professor Ng describes and as we’ll see in the assignment for this topic. E.g. the case in which you are implementing a secure entry to your office for your employees. But you still need to run the model to compute the embedding of the image from the door cam and then see if it matches any of your database entries. In that case, we do use the norm of the difference between the two embeddings as the distance metric.

This version of DLS was published in April 2021. A lot has happened in ML since then, but I have not been tracking the literature about Face Recognition to know if anyone has come up with more advanced techniques than what Professor Ng teaches us in this section. Maybe we get lucky and someone else can point us to more advanced techniques.

liuhaoyong · September 22, 2025, 1:52pm

Thank you very much for your reply！

This is my understanding:
The last neuron of this model aims to compare the similarity of two image vectors to determine if they are of the same person. In that case, can the model be made to only output image vectors, and then use algorithms such as the cosine similarity algorithm for vectors to calculate whether the two image vectors are similar?

rmwkwok · September 22, 2025, 3:32pm

Hello @liuhaoyong

In this case, then there is just no (trainable) model at all, because there are only 2 parts: retrieve image vector from vector database and calculate similarity.

In this sense, the slide isn’t really quite relevant to your question because the slide is about training a model that produces image vectors which can be compared for telling if they are two photos of the same person.

To answer “yes” to your question, I think we need to first make sure that the vectors retrieved are good for this very specific purpose. Can we make sure they are? I think it takes experimentation to verify, but I could imagine that the context embedded in the image vectors, especially those by multi-modal LLM, could be too rich that goes completely beyond the identification of a person. This means that if I just apply cosine similarity, maybe it’s not just comparing the person’s identity.

I think it’s a fun thing to give it a try!

Cheers,
Raymond

Topic		Replies	Views
Triplet Loss Doubt Convolutional Neural Networks coursera-platform	3	548	September 14, 2022
How to choose the distance threshold=0.7 for face verification problem in the lecture W4 course Convolutional Neural Networks? Convolutional Neural Networks week-module-4 , coursera-platform	4	721	May 13, 2024
Why is the distance between vectors for the triplet loss calculated as such? Convolutional Neural Networks coursera-platform	5	641	January 12, 2022
Why not use coef of correlation instead of cosine similarity between embedding vectors AI Discussions ai-discussions , llm	1	114	May 9, 2024
Dot product vs. cosine distance between embedding vectors LangChain: Chat with Your Data	2	342	July 13, 2023

Questions about face recognition classification models

Related topics