In this video, the professor mentions that we need to cache z[l]. But in the equations shown at the time stamp 3:54, I cannot find the use of z[l] in back propagation, only that of a[l]. Of course, one can easily be obtained from the other, but I wanted to clear this up
The issue is that the cache needs to handle the fully general case. The backprop formula to consider is:
dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})
It turns out that in all of the cases we see in this course, g^{[l]'}(Z^{[l]}) can be more simply expressed as a formula based on A^{[l]} once you actually compute the derivative of the activation function. But the issue is that you can’t assume that will always be true in the general case with any possible activation function that could be used. That’s why you need Z^{[l]}: to compute g^{[l]'}(Z^{[l]}).
Of course, as you pointed out, it’s easy to get from Z^{[l]} to A^{[l]} if that is what you prefer because you’ve already worked out the derivative and it only involves A^{[l]}.
1 Like