Hello,

I am not quite sure I understood the point of gradient checking. I understand the theory and the reason why the calculation makes sense to verify that a derivative is properly done.

But I don’t see why we would apply this in neural networks. In the courses and small tests I have done so far on my own, calculating derivatives is something that is not needed since the ML packages do it alone.

So, I don’t understand why we would do grad check? Is it even possible that for example Keras “does a mistake” during the optimization?

I am especially surprised by Andrew’s statement that gradient checking has been useful to him many times.

Thank you in advance for the clarification.

Regards,

Adrian