As you can see, the answer is off by just 0.000001 because neg_dist is off by 0.000001. I can’t see how my code could be wrong though, perhaps this is a problem with the grader?
You probably lose resolution by computing the norm and then squaring it. You potentially introduce more rounding errors when you do that. Just compute the squares in the first place and then do reduce_sum and you’re there. No norm required. And as a side benefit, your code will be more efficient …
It is an interesting and legitimate question why they made the assertions here check for exact equality. That’s always a little dangerous in floating point. In a lot of cases they use allclose instead, but not here for some reason. Hmmmmm.
Your solution works, but I’m having trouble understanding the difference between tf.reduce_sum and tf.norm. I understand what each one does:
tf.reduce_sum squares each value in the axis selected, adds those squares, and then calculates the square root of that sum
tf.normsimply adds each value in the axis selected
But what is the practical difference between the two. Why would you ever use tf.norm instead of the simpler and more efficient tf.reduce_sum? Also, when applied to vectors the result of tf.norm represents the distance between the vector and 0. Does tf.reduce_sum represent anything visual?
Reduce sum simply computes the sum across the specified dimension, but in the way we are using it, what we are computing is:
loss = \displaystyle \sum_{i = 1}^{n} v_i^2
There is no square root there. Because the loss here is defined as the sum of the squares, that’s all we need. Of course we can see from the above that:
loss = ||v||^2
You use the norm when what you want is the norm. That’s not what we want here. Of course what you did ends up with the same result by adding one more step to take advantage of the latter formula above: squaring the norm to remove the square root and end up with the sum of the squares. That is both a) a waste of computation, because computing the square root is actually a pretty expensive operation and b) it introduces more computations and thus more opportunities for rounding errors to accumulate because nothing we do in floating point is exact. It was that latter point of course that caused you to fail the test, even though I think we all agree that they should have used allclose rather than testing for exact equality there.
I’ll summarise for the sake of my own understanding:
Using tf.norm I was really doing: X2= X1.^2 —> X3= reduce_sum(X2) —> X4= sqrt(X3) —> X5= X4^2
Using your method I am now doing: X2= X1.^2 —> X3= reduce_sum(X2), which removes the last two unnecessary steps which would make the calculation both more inefficient and more prone to rounding error