Hi!
In week 4, the video “Forward and Backward Propagation”, we are introduced to the formula for calculating the derivative of the loss function with respect to the activation function of the previous layer (as part of the backward propagation):
da[l-1] = W[l].T dot dz[l]
Andrew in the video says that he won’t go through the derivation, but I am still very curious why this works and am not able to derive it on my own. Does anyone have a good explanation/intuition for this?
Thanks in advance!