For the Week 3 : Neural network with one hidden layer, Please share the derivation of each Gradient that is calculated Prof Andrew simply wrote the final gradient value. It is diifficult to memorize without having the derivation.
By derivation I mean steps of calculating d/dW1 (J), d/db1(J)
d/dW2(J) and d/db2 (J)
Note that Prof Ng has specifically designed these courses not to require any knowledge of calculus, which is why he does not show most of the derivations.
Here’s a thread with links to material that covers the derivations. (Also note that this one is linked from the relevant topic on the DLS FAQ Thread, which is worth a look in general if that is new to you.)