Please teach me!!!
In week3 neural nets with one hidden layer, I learned dZ2 = A2 - Y. This is a output layer.
But in week4, I learned in output layer we calculate dZL = dAL .* gβ(ZL).
Why has it changed ?
The second formula is the general case of that calculation that works at any layer. The first version is what you get if you apply the second formula to the specific case of the output layer because of derivatives of the cross entropy loss function and the sigmoid activation function. That derivation is shown in this popular thread from Eddy.
The better way to write them would be to make clear that the layer number in the second formula is not just L for the last layer:
dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})
dZ^{[L]} = A^{[L]} - Y
Okay! I got it!!! Thank you!!!
any suggestions how this can be derived?
That is beyond the scope of these courses, but there are lots of other webpages that cover the calculus behind all this. Hereβs a thread with pointers to get you started on that. I got that link from one of the topics on the DLS FAQ thread, which is worth a look just on general principles if you havenβt already seen it.