No, those two methods give different results. The second method will give you the square root of the first method. Do you know the definition of the L2 norm of a vector? It is the traditional Euclidean length of the vector:
As to why they use the square in the triplet_loss case and the plain L2 norm in the verify case, I don’t know. I think it’s just a choice they have made.
Sure, taking the square root is a computationally expensive operation and it’s not clear what semantic value it adds. It also makes the gradients more complicated. That’s why they use squared Euclidean distance in lots of places (e.g. MSE for regression). So why didn’t they use the square in the verify case also? I don’t know …
Well, I guess one theory is it’s easier to code tf.norm, rather than tf.reduce_sum of tf.square. But that’s the only reason I can think of.
That’s an excellent point: in verify, they are comparing the computed value to the value in the database, so obviously we have to use the same metric that was used to create the encodings in the database. That must be the answer as to why we had to code it that way, but it leaves open the question of why they didn’t go for the “cheaper to compute” version in the second case as they did in the first. It doesn’t take any more space in the database to store the square of the number.
It probably would have been better to be consistent throughout this assignment, but at least we have an explanation. Thanks!