More understanding on the use of cache in L-layer deep network

namratasri01 · June 3, 2021, 8:04am

Hi,

I am having trouble understanding why there was a need to store A[1], A[2],…, A[L] in the cache. In the lecture, Andrew said we are storing Z[1], Z[2] to use the value again for computing derivatives in the backward propagation layer. However, in the programming assignment also, it’s not clear to me where the cache Z[l] was used.

For example, for a 2-layer network-
A1, cache1 = …
A2, ccahe2 = …
cost = …
dA1, dW2, db2 = …
dA0, dW1, db1 = …

Here, cache1 contains [A0, W1, b1, Z1]. Why do we need to save all of them? And what is their use?

Could you please explain.

Thanks.

kenb · June 3, 2021, 12:53pm

Hi @namratasri01 and welcome to the DL Specialization. First some ground rules. You are not allowed to post your proposed solutions from the notebooks. It’s a violation of the Honor Code. Please refrain, henceforth.

To use gradient descent to arrive at a solution, you need the derivatives of the function that you are trying to optimize. These derivatives are evaluated at the solutions that you obtained during forward prop: the A’s which are functions of the Z’s. In elementary terms, to find the slope at a point on a function, one must know the value of that point. So you need your A’s (function of the Z’s), the input X, and the output Y. In terms of the algorithm, it may be that you only need the A’s, but those have been appropriately evaluated using the Z’s.

namratasri01 · June 4, 2021, 1:06am

Hi Ken,

Thank you for answering my queries. Now, I understood the use of cache in the code.
Further, I apologise for adding the code snippet. I am already aware of the Honor Code and for that reason I have only included some bits of my proposed code. But, as advised by you, in future will refrain to post my code on the forum. (I have removed my solution from the post).

kenb · June 4, 2021, 12:05pm

Thanks, @namratasri01. Onwards and upwards!

Topic		Replies	Views
DLS Course 1 week 4 assignment 1 exercise 9 Neural Networks and Deep Learning coursera-platform	4	1123	September 16, 2022
Don't we need to cache a[l] instead of z[l] in forward propagation? Neural Networks and Deep Learning week-module-4	2	20	June 21, 2025
Are we caching Z for backprop only for RELU? Neural Networks and Deep Learning coursera-platform	5	600	September 6, 2022
Confused about linear_cache and activation_cache Neural Networks and Deep Learning coursera-platform	4	683	April 30, 2022
Building Block NN cache Z,W,b Neural Networks and Deep Learning coursera-platform	1	548	May 21, 2021

More understanding on the use of cache in L-layer deep network

Related topics