Could someone help explain about this to me?


That I highlight on the pic I don’t understand why suddenly gets dZ like that
why it not be dz[1] = a[1] - y ?

Hi Enfant,

dZ[2] is shorthand for dL/dZ[2], with L referring to the loss function. Because of the definition of the loss function, dZ[2] = A[2] - Y. You can find the (non-vectorized) derivation here.

In its turn, dZ[1] = dL/dZ[2] * dZ[2]/dA[1] * dA[1]/dZ[1] (following the chain rule).

dZ[2]/dA[1] = W[2] while dA[1]/dZ[1] = g[1]'(Z[1]). This leads to the equation you highlight.