I am having trouble understanding why there was a need to store A[1], A[2],…, A[L] in the cache. In the lecture, Andrew said we are storing Z[1], Z[2] to use the value again for computing derivatives in the backward propagation layer. However, in the programming assignment also, it’s not clear to me where the cache Z[l] was used.
For example, for a 2-layer network-
A1, cache1 = …
A2, ccahe2 = …
cost = …
dA1, dW2, db2 = …
dA0, dW1, db1 = …
Here, cache1 contains [A0, W1, b1, Z1]. Why do we need to save all of them? And what is their use?
Hi @namratasri01 and welcome to the DL Specialization. First some ground rules. You are not allowed to post your proposed solutions from the notebooks. It’s a violation of the Honor Code. Please refrain, henceforth.
To use gradient descent to arrive at a solution, you need the derivatives of the function that you are trying to optimize. These derivatives are evaluated at the solutions that you obtained during forward prop: the A’s which are functions of the Z’s. In elementary terms, to find the slope at a point on a function, one must know the value of that point. So you need your A’s (function of the Z’s), the input X, and the output Y. In terms of the algorithm, it may be that you only need the A’s, but those have been appropriately evaluated using the Z’s.
Thank you for answering my queries. Now, I understood the use of cache in the code.
Further, I apologise for adding the code snippet. I am already aware of the Honor Code and for that reason I have only included some bits of my proposed code. But, as advised by you, in future will refrain to post my code on the forum. (I have removed my solution from the post).