C2W1 Grad Check: why is the Epsilon used to estimate grads also used as threshold in the check?

Hi, first post/question! Thank you!

In Week 1, Andrew explains Grad Check, and uses both in the videos, and in the exercises, as an example, an epsilon equal to 1*e-7 to both estimate numerically the gradient, and then also uses that same quantity as the threshold to which we compare the difference measured to decide if the model is working as expected.

I’m having trouble understanding this intuitively. Is there an intuitive explanation for why we’d use epsilon or 2*epsilon as threshold? why does the same quantity (epsilon) work for both estimating numerically the gradient, and as threshold to validate the actual calculation of gradients? I understand we need to pick a very small quantity to estimate the gradients, but why would such same small quantity also be the adequate one to compare estimates to? If not, how would we go about selecting the right threshold for our problem? Thanks!

It doesn’t have to be the same quantity, it was just convenient for this example.

1 Like

Typically you don’t really need to worry about this very much, because the entire purpose of gradient checking is just to double-check that your cost function is computing the gradients correctly.

So you only have to do this when you write a fresh implementation of a cost function.

Since there aren’t very many useful cost functions (pretty much just two, for linear regression and for logistic regression), you really don’t need to do this more than once or twice in your entire ML career.

And in practice, with modern tools you really don’t ever have to compute the gradients yourself (or even write your own cost function), because the popular tools (like TensorFlow or scikit-learn) provide all of that code for you.

1 Like