Gradiant Desent over normalization

I think the lecture didn’t give much detail about how the derivitives like d_beta and d_theta was calculated when normalization was involved. Is there any materials to cover the computation graph style deriviation calculation process?


Hello desu,

Gradient descent is discussed in other specialisation like in ML(Machine Learning) specialisation.

Although you can ask here what part of computation graph you were unable to understand, or is it like you did not understand the whole calculation process?