Hi
I’m working on gradient checking for N-layer neural network. The approximation of gradients is very close to the real gradients used in back propagation. But, when I use the equation presented in lecture, the result is unbelievably high.
For example the gradient is 1.26596283e-06 and its approximation is -1.26620936e-6.
I’ve checked gradient and all is like the mentioned example. But when I compute the ||dtheta_approx - dtheta|| / ||dtheta_approx|| + ||dtheta||, I receive 16.7335560670404, that is very very big.
Does anybody have any idea?
Please implement the equation correctly keeping in mind that brackets matter for computing the denominator term:
difference = \frac {\| grad - gradapprox \|_2}{\| grad \|_2 + \| gradapprox \|_2 } \tag{3}
Thanks for your answer, but i’ve implemented exactly what you’ve presented here regarding the position of brackets.
This is what i’ve done:
difference = np.linalg.norm(grad - gradapprox) / (np.linalg.norm(grad) + np.linalg.norm(grad_approx))
Please look at your original post. You are calculating
difference = \frac {\| grad - gradapprox \|_2}{\| grad \|_2} + \| gradapprox \|_2
If your implementation of gradapprox
and grad
are correct, please click my name and message your notebook as an attachment.
Thanks a million. I sent my code.
Please share the full notebook and not the function and its supporting code.
Thanks again for your answers.
I finally found the solution. I used np.squeeze(grad_approx) and the difference became 2.7245835408876563e-08 as I expected.