numerator = np.linalg.norm(grad - gradapprox)

denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)

difference = numerator / denominator

Hi friend,

In the grad checking, I think the 1st line “np.linalg.norm(grad - gradapprox)” already did everything. If grad = gradapprox, just means

so, if this line 1 is big, something wrong, if small, all good, Am I right? everything is done.

Here, my question, what’s the meaning of this denominator, then you need to do this difference = numerator / denominator ?? What are we doing here? It seems like, we don’t like A-B, but we want to do (A - B)/(A + B) ? Thank you!

The point is that it’s a question of scale, right? Just telling me that the length of the difference is 1.0, how do I know if that’s a big difference or not? If the original vectors have length 1,000,000, then a difference of 1.0 is pretty small. If the original vectors have length 2, then a difference of 1.0 is a big deal.

oh, that make sense. yes, I think it’s a scale question.

May I ask, why norm(A)+norm(B) is the scaler of norm(A-B)? why not something else? my bad, I know this question is silly… but thank you

You could use the norm of just one of the vectors for the scale. If you choose to do that, you’d probably want to use the `gradapprox`

, not `grad`

, since the whole point of this exercise is that the “real” grad may be wrong. That’s what we’re trying to diagnose here, of course. But it doesn’t really matter if you use both. So you could say this is a convention, not an ironclad rule. You might want to adjust what you use as the threshold value for success to be twice as large if you decide to only use the norm of `gradapprox`

in the denominator.