In the above code, why is the embedding being divided by L2 norm of the embedding? What is the intuition behind this?

Since its not guaranteed the 128 dimensional output of the model will always get a unit vector (which is what we expect when we’re making embeddings), we divide it by the norm. That ensures whatever the output prediction of the model is, it’ll always be a unit vector.

Hi! I am doing FaceNet assignment now, and having the same question and found your answer! I want to say thank you first, but your answer gives me another question.

Does our expectation that the 128 dimensional output should be unit vector come from the typical process to make all inputs unit vector when training the model? Or is there any other reason that we should guarantee the unit vector outputs?

Thank you!

I think that it comes from the fact that training input data is unit vector. @XpRienzo can confirm this

Thanks @ashish_learns.

To @XpRienzo, can you also explain me why we use L2 norm to encoding result, not L1 norm? Since the result of the prediction is a vector, not a matrix, according to the lecture slides and the graph in the assignment, is not it more plausible to use L1 norm to normalize it?

hey @baekTree! So L2 norm is the exact value of the distance of the said vector from the origin of the vector plane. So when we divide that value from our vector, it just leaves us with a vector who’s distance from origin is one. That is, its magnitude is one. Other norms cannot exactly accomplish this.

@XpRienzo Thank you! I was somewhat confused with the concept of L1 norm, L2 norm and Frobenius norm… Now everything makes sense! Thank you again!

Do you have any good resource to under L1 and L2 norms in detail?