The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l]

Can anyone explain why db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] in week 4 and the video “Forward and backward propagation”?

1 Like

Because we have this formula in the forward direction:

z{[l]} = W^{[l]} \cdot a^{[l-1]} + b^{[l]}

What happens when you differentiate? Note also that we are taking derivatives of the cost, so that is “in the numerator” if we can use that slightly “off kilter” terminology (a derivative is not really a fraction, but I hope you see what I mean there). Remember what Prof Ng means by his “d” notation for gradients:

db^{[l]} = \displaystyle \frac {\partial J}{\partial b^{[l]}}

Note that Prof Ng specifically designed these courses not to require knowledge of calculus, so he doesn’t explain this type of derivation. If you have the math background, here’s a thread with links to lots of information on the derivation of back propagation.

Thank you alot.
But I’m still confused with the explanation of why \frac{\partial J}{\partial a^{[l-1]}}=W^{[l]^T}.\frac{\partial J}{\partial z^{[l]}} because it can’t be explained with the help of chain rule or at least I don’t understand why! :face_exhaling:
And so for db^{[l]}=dz^{[l]} which is actually \frac{\partial J}{\partial b^{[l]}}=\frac{\partial J}{\partial z^{[l]}}

We are doing matrix calculus here. If you are not familiar with that, please see the links on the thread I gave earlier. E.g. this one that is pointed to from that thread.

I saw The Matrix Calculus You Need For Deep Learning. So useful. thank you very much.