Grad check threshold

gils · April 19, 2021, 5:36pm

Hi,

In “Numerical Approximation of Gradients”, Andrew shows that - for the two-sided difference - the approx. error is on the order of epsilon squared, which is much smaller than the approx. error of the one-sided difference (which is on the order of epsilon), given that epsilon < 1.

So, as a natural consequence of this premise, I would have expected that the threshold for considering the grad check as passed would be in the order of epsilon squared (1e-14 in the example) and not in the order of epsilon (1e-7 in the example), because d_theta_approx has been computed with the two-sided formula and its approx. error with respect to d_theta (computed with back prop) should be in the order of epsilon squared.

What am I missing?
Thx

paulinpaloalto · April 20, 2021, 12:40am

It’s a good point, but don’t forget that we’re not just dealing with the error generated by using finite differences to approximate the gradients: everything here is in floating point. So we also need to allow a little leeway for the general noise induced by the usual rounding error / error propagation behaviors. In other words, even a completely correct algorithm is going to have some error just from the usual numerical analysis issues. E.g. something as simple as the difference between:

cost = -1./m * (some formula with np.dot or np.sum and np.multiply)

versus

cost = -(some formula with np.dot or np.sum and np.multiply)/m

Will generate noticeably different results in a few thousand iterations.

neurogeek · April 20, 2021, 1:11am

Hey gils,

Interesting observation!
I think they point of this exercise is to see the difference of using the one-sided vs the two-sided method to check your back_propagation implementation is good enough and the degree of certainty you can get of out them.

Following the above, in the 2-sided case, we are not really interested in checking that d_theta_approx is near epsilon squared but that is much smaller than just epsilon. It’s all about accuracy and making sure the error that generates your implementation of back_prop is small enough to not matter in the grand scheme of things.

Hope that helps!

gils · April 20, 2021, 8:58am

Ok. Thanks to both for your help!

Topic		Replies	Views
Gradient Check Error Threshold - Theory Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	565	May 1, 2022
C2W1 Grad Check: why is the Epsilon used to estimate grads also used as threshold in the check? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	354	October 11, 2023
Grad check error Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	519	January 13, 2023
Course 2 week 1 1 dimensional forward and back propagation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	10	576	June 5, 2023
DLS Course 2 Week 1 - Gradient Checking Implementation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	528	October 17, 2022

Grad check threshold

Related topics