I am not quite sure I understood the point of gradient checking. I understand the theory and the reason why the calculation makes sense to verify that a derivative is properly done.
But I don’t see why we would apply this in neural networks. In the courses and small tests I have done so far on my own, calculating derivatives is something that is not needed since the ML packages do it alone.
So, I don’t understand why we would do grad check? Is it even possible that for example Keras “does a mistake” during the optimization?
I am especially surprised by Andrew’s statement that gradient checking has been useful to him many times.
Thank you in advance for the clarification.