Hi @Marcia_Ma @Muhammad_John_Abbas @balaji.ambresh

Welcome @Marcia_Ma to the Community!

dA^{[L]} should be found to calculate the dz^{[l]}, dw^{[l]}, and db^{[l]} if we didn’t use brief way or another way from chain rule to calculate these variables dz^{[l]}, dw^{[l]}, and\ db^{[l]}

The equations in this image

is only for the last layer to calculate dz^{[l]}, dw^{[l]} and , db^{[l]} and the activation function here is softmax if it isn’t softmax these equations will be change as it is brief.

To get the original equation of how to calculate dz^{[l]}, dw^{[l]} and , db^{[l]} and from they came is

dZ^{[l]} = \frac{\partial \mathcal{L^{[l]}} }{\partial A^{[l]}}* \frac{\partial \mathcal{A^{[l]}} }{\partial Z^{[l]}}

dW^{[l]} = \frac{\partial \mathcal{L^{[l]}} }{\partial A^{[l]}}* \frac{\partial \mathcal{A^{[l]}} }{\partial Z^{[l]}} * \frac{\partial \mathcal{ Z^{[l]}} }{\partial W^{[l]}}

dB^{[l]} = \frac{\partial \mathcal{L^{[l]}} }{\partial A^{[l]}}* \frac{\partial \mathcal{A^{[l]}} }{\partial Z^{[l]}} * \frac{\partial \mathcal{ Z^{[l]}} }{\partial B^{[l]}}.

That’s called chain rule in the derivatives so that here we remove the step of calculating da^{[l]}

**But** if the last layer **isn’t softmax** we should change this equations to calculate the appropriate equations according to the activation function for example if the last layer activation function is **sigmoid** we should calculate the

dAL = \frac{\partial \mathcal{L^{[L]}}}{\partial A^{[L]}} = -\frac{y^{[1]}}{a^{[1]}} +\frac{1-y^{[1]}}{1-a^{[1]}} ...-\frac{y^{[m]}}{a^{[m]}} +\frac{1-y^{[m]}}{1-a^{[m]}}

and calculate that dZ^{[l]} = \frac{\partial \mathcal{L^{[l]}} }{\partial Z^{[l]}} according to( dZ^{[l]} = \frac{\partial \mathcal{L^{[l]}} }{\partial A^{[l]}}* \frac{\partial \mathcal{A^{[l]}} }{\partial Z^{[l]}} ) if you want the equation of the dZ^{[l]} of the sigmoid function will be dZ^{[l]} =\frac{\partial\mathcal{L^{[l]}} }{\partial Z^{[l]}} = A^{1}(1-A^{1}) ... A^{m}(1-A^{m})

so concretely we must have dA^{[ for \ each \ layer \ except \ layer \ 0]} to get all ( dZ^{[l]},dW^{[l]}, db^{[l]}) by chain rule

Cheers,

Abdelrahman