Hi, everyone,

I don’t think I understand how

dZ[2] = A[2] - Y

but

dZ[1] = W[2]T . dZ[2] * g’[1](Z[1])

(week 3 material)

In my intuition I was expecting it to be something small like dZ[2] was. Can someone guide me to this conclusion?

Thanks in advance,

Leandro Pires.

The formula you show for dZ^{[1]} is the generic formula that works at any layer. The specific formula you show for dZ^{[2]} corresponds to the case in which layer 2 is the output of a binary classifier with the cross entropy loss function. You can find the derivation of that in this thread.

If you want to see the derivation of the general formula for dZ^{[l]}, that is beyond the scope of this course. Please see this thread for some links that cover the derivation.