Hi, first post/question! Thank you!
In Week 1, Andrew explains Grad Check, and uses both in the videos, and in the exercises, as an example, an epsilon equal to 1*e-7 to both estimate numerically the gradient, and then also uses that same quantity as the threshold to which we compare the difference measured to decide if the model is working as expected.
I’m having trouble understanding this intuitively. Is there an intuitive explanation for why we’d use epsilon or 2*epsilon as threshold? why does the same quantity (epsilon) work for both estimating numerically the gradient, and as threshold to validate the actual calculation of gradients? I understand we need to pick a very small quantity to estimate the gradients, but why would such same small quantity also be the adequate one to compare estimates to? If not, how would we go about selecting the right threshold for our problem? Thanks!