Don't we need to cache a[l] instead of z[l] in forward propagation?

bandr · June 21, 2025, 5:21pm

In this video, the professor mentions that we need to cache z[l]. But in the equations shown at the time stamp 3:54, I cannot find the use of z[l] in back propagation, only that of a[l]. Of course, one can easily be obtained from the other, but I wanted to clear this up

paulinpaloalto · June 21, 2025, 8:55pm

The issue is that the cache needs to handle the fully general case. The backprop formula to consider is:

dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})

It turns out that in all of the cases we see in this course, g^{[l]'}(Z^{[l]}) can be more simply expressed as a formula based on A^{[l]} once you actually compute the derivative of the activation function. But the issue is that you can’t assume that will always be true in the general case with any possible activation function that could be used. That’s why you need Z^{[l]}: to compute g^{[l]'}(Z^{[l]}).

Of course, as you pointed out, it’s easy to get from Z^{[l]} to A^{[l]} if that is what you prefer because you’ve already worked out the derivative and it only involves A^{[l]}.

paulinpaloalto · June 21, 2025, 9:11pm

Here’s the screenshot of that lecture at 3:54 with the z^{[l]} highlighted with the green rectangle that I added:

Topic		Replies	Views
Are we caching Z for backprop only for RELU? Neural Networks and Deep Learning coursera-platform	5	628	September 6, 2022
Queries on backwards activation functions (C1W4) Neural Networks and Deep Learning coursera-platform	1	532	December 7, 2021
More understanding on the use of cache in L-layer deep network Neural Networks and Deep Learning coursera-platform	3	559	June 4, 2021
DLS C1 WK4 - Building Blocks of Deep Neural Networks - Caching Z Neural Networks and Deep Learning coursera-platform	2	550	December 19, 2021
Week 4: Why activation function returns Z? Neural Networks and Deep Learning coursera-platform	1	529	August 8, 2021

Don't we need to cache a[l] instead of z[l] in forward propagation?

Related topics