The following question is in reference to Course 1, week 4:
I’ve gone through the lectures several times now, and I haven’t seen the derivation of dZ^[1]. It’s also not covered in the “Backpropagation Intuition (Optional)” video. The equation is presented, but its derivation is never discussed. It’s an element-wise multiplication of two matrices and its form completely breaks the symmetry of the other gradient equations.
The equation in question is as follows:
dZ^[1] = ((W^[2]).T) (dZ^[2]) * g’(Z^[1]), where * is element-wise multiplication
What is ((W^[2]).T) (dZ^[2])? I’ve not seen W.T multiply dZ at any other point in the course.
Was this derivation intentionally skipped because of its difficulty level? Can you point me to a reference in which this gradient is derived? I’d really like to complete the picture on its derivation. Thanks in advance.