A doubt on gradient checking

Dear Mentor,

Could you please guide me on this issue?


Regarding to this formula, there is a statement in the lecture mentioning that

“the row for the denominator is just in case any of these vectors are really small or really large, the denominator turns this formula into a ratio”

May i have any mathematical example to understand this statement?

Thank you.

You need to “scale” the results of the check by the sizes of the actual vectors you are approximating. Suppose the difference value comes out to be 0.5. How do you know if that’s a big error or not? If the norm of the actual vectors are say 10^6, then 0.5 is a pretty small error. But how about if the norms are 1? Then it’s a pretty big error, right?

Or think of it this way: you’re converting the error into a “percentage error” without the factor of 100. If I’m measuring the distance from here to the moon, then 1 meter is a pretty small error. One meter divided by the distance from here to the moon is a small number. If I’m measuring the length of my left arm, then 1 meter is a pretty big error. :nerd_face:

When dealing with approximation error, the scale matters.

Hello @JJaassoonn ,

You may also do this kind of checking yourself to get some hands-on.


Thanks, Raymond! That’s a really cool example that concretely demonstrates the point I was just talking about in general terms. I’ve bookmarked that thread and will use it if this question comes up again!

Dear Mr Paul Mielke,

Thank you so much for your guidance.

Dear Mr Raymond,

I have studied the example from the discussion thread. Thank you so much for sharing it.

That was an example of how we can understand something by just working it out. Hope that will help your future study.