Why is the distance between vectors for the triplet loss calculated as such?

nathan_zorndorf · January 11, 2022, 9:52pm

In the lectures, Andrew Ng defines the triplet loss by taking the difference between the output vectors, then calculating the L2 norm, and squaring that.

Screen Shot 2022-01-10 at 4.32.07 PM

Why did we just take the simple difference instead of calculating, say, euclidean distance?

TMosh · January 11, 2022, 10:31pm

The “squared norm” is computationally easier, since you don’t have to compute the square root.

paulinpaloalto · January 12, 2022, 12:16am

It’s not the “simple difference”. The 2-norm is the Euclidean length of a vector. So what he shows is the square of the Euclidean length of the difference vector. The point is that Prof Ng is just showing the mathematical formula there. You would not write the code to compute the 2-norm and then square it, because (as Tom points out) you’d be wasting the computation needed to compute the square root (relatively expensive) and then squaring the result (relatively cheap). You would just compute the sum of the squares of the differences, which is the first step in computing it the long way. But that gives you the answer as Prof Ng has specified it above.

nathan_zorndorf · January 12, 2022, 4:53pm

Thank you for your answers both Paul and Tom. I guess I am wondering why we calculate the similarity of the two vectors as such, when we could use something like cosine similarity as well.

paulinpaloalto · January 12, 2022, 5:27pm

It’s a good question, but I don’t know the answer. Of course the vectors we are comparing are “embeddings” in the sense of semantic embedding: unit vectors in a space in which the dimensions represent the strength of various (learned) attributes of a face. So just using the Euclidean distance seems reasonable as a method of measuring the similarity of two embeddings. But this is an experimental science, right? You can take the model and switch to using cosine similiarity as your cost function and then compare the results. If your version works better, write it up!

Or before you invest that effort, the simpler approach would be to read some of the references that are given at the end of this assignment. Maybe they comment in the papers about which distance functions they considered and why they made the choices that they used.

nathan_zorndorf · January 12, 2022, 10:11pm

Yes, I considered both those approaches - I even looked into the FaceNet paper, and saw that they reference another paper for the determination of their distance function (which was way over my head.).
Thanks for clarifying - I feel satisfied now
Nathan

Topic		Replies	Views
C4 W4A1 distance between encoding for face recognition and face verification Convolutional Neural Networks	3	538	February 20, 2023
Issue in Programming Assignment: Deep Learning for Content-Based Filtering Unsupervised Learning, Recommenders, Reinforcement week-2	9	830	January 13, 2023
Two ways to find the "distance" between encodings Convolutional Neural Networks	5	650	January 31, 2022
Exercise 1 - triplet_loss - why Convolutional Neural Networks	2	512	June 7, 2022
C1_W3_lecture_nb_02_manipulating_word_embeddings NLP with Classification and Vector Spaces week-3	3	290	April 15, 2023

Why is the distance between vectors for the triplet loss calculated as such?

Related topics