I don't know the difference between dZL = AL - Y and dZL = dAL .* g'(ZL)

The second formula is the general case of that calculation that works at any layer. The first version is what you get if you apply the second formula to the specific case of the output layer because of derivatives of the cross entropy loss function and the sigmoid activation function. That derivation is shown in this popular thread from Eddy.

The better way to write them would be to make clear that the layer number in the second formula is not just L for the last layer:

dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})
dZ^{[L]} = A^{[L]} - Y

2 Likes