I don't know the difference between dZL = AL - Y and dZL = dAL .* g'(ZL)

paulinpaloalto · February 8, 2022, 4:50pm

The second formula is the general case of that calculation that works at any layer. The first version is what you get if you apply the second formula to the specific case of the output layer because of derivatives of the cross entropy loss function and the sigmoid activation function. That derivation is shown in this popular thread from Eddy.

The better way to write them would be to make clear that the layer number in the second formula is not just L for the last layer:

dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})
dZ^{[L]} = A^{[L]} - Y

Topic		Replies	Views
Confusion about Calculating dZ^[l] Neural Networks and Deep Learning	3	807	October 26, 2022
W4_QUIZ_dZL = AL - Y Neural Networks and Deep Learning	4	692	January 26, 2023
dZ[1] derivation Neural Networks and Deep Learning	1	718	November 4, 2021
Week4- assignment 2- Difference in gradient calculation for the last layer activation in neural networks Neural Networks and Deep Learning	2	675	May 17, 2023
Confused at how the formula of dz in hidden layer was deducted Improving Deep Neural Networks: Hyperparameter tun	1	614	April 4, 2022

I don't know the difference between dZL = AL - Y and dZL = dAL .* g'(ZL)

Related topics