C2W1 Grad Check: why is the Epsilon used to estimate grads also used as threshold in the check?

Isidro_Pascual · October 10, 2023, 11:26pm

Hi, first post/question! Thank you!

In Week 1, Andrew explains Grad Check, and uses both in the videos, and in the exercises, as an example, an epsilon equal to 1*e-7 to both estimate numerically the gradient, and then also uses that same quantity as the threshold to which we compare the difference measured to decide if the model is working as expected.

I’m having trouble understanding this intuitively. Is there an intuitive explanation for why we’d use epsilon or 2*epsilon as threshold? why does the same quantity (epsilon) work for both estimating numerically the gradient, and as threshold to validate the actual calculation of gradients? I understand we need to pick a very small quantity to estimate the gradients, but why would such same small quantity also be the adequate one to compare estimates to? If not, how would we go about selecting the right threshold for our problem? Thanks!

TMosh · October 11, 2023, 12:18am

It doesn’t have to be the same quantity, it was just convenient for this example.

TMosh · October 11, 2023, 12:19am

Typically you don’t really need to worry about this very much, because the entire purpose of gradient checking is just to double-check that your cost function is computing the gradients correctly.

So you only have to do this when you write a fresh implementation of a cost function.

Since there aren’t very many useful cost functions (pretty much just two, for linear regression and for logistic regression), you really don’t need to do this more than once or twice in your entire ML career.

And in practice, with modern tools you really don’t ever have to compute the gradients yourself (or even write your own cost function), because the popular tools (like TensorFlow or scikit-learn) provide all of that code for you.

Topic		Replies	Views
Gradient Check Error Threshold - Theory Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	565	May 1, 2022
Grad check threshold Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	584	April 20, 2021
Why does epsilon is applied on parameters instead of X in gradient_check_n Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	329	October 27, 2023
Question regarding Gradient Checking Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	453	June 25, 2023
Gradient checking epsilon vanishes for initial layer in deep NN Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	6	19	February 8, 2025

C2W1 Grad Check: why is the Epsilon used to estimate grads also used as threshold in the check?

Related topics