W4 _L_model_backward_Why do we need dA0

premkgowda · December 27, 2022, 10:57pm

in the final assignment part 1, I am unable to figure out why we need dA0 in L_model_backward(AL, Y, caches) function. In the given test case, looks we have 2 layers and the final output AL is from layer 2. dAL would dA2 and then we only need to calculate dA1 But the code given goes on printing dA0. where did A0 come from since A0 is essentially the initial input. From what I understand we don’t calculate the gradient of the input layer.

premkgowda · December 27, 2022, 10:59pm

This topic addresses the issue. Can we use dA0 to do something cool?

paulinpaloalto · December 27, 2022, 11:21pm

Yes, we don’t need dA0, but the fact that we end up calculating dA0 is just an “artifact” of the way back prop works. At each layer, we take dA^{[l]} as input and compute dW^{[l]}, db^{[l]} and dA^{[l-1]}. Of course we need the dW and db values for layer 1, so we get dA0 as a side effect. On the other thread you link, there was some speculation about whether there is actually any information you could glean from the gradients of the inputs, but it is just that (speculation).

Topic		Replies	Views
Week4 - Building Blocks of Deep Neural Networks Neural Networks and Deep Learning	3	557	October 27, 2021
Why is there no A0 or X in the backward chain? Week4, Assignment1 Neural Networks and Deep Learning week-4	10	95	June 30, 2024
week4_Exercise 9 - L_model_backward Neural Networks and Deep Learning	1	534	August 14, 2022
DLS Course 1 Week 4 exercise 9 L_model_backward Neural Networks and Deep Learning	2	696	August 12, 2021
Can we use dA0 to do something cool? Neural Networks and Deep Learning	1	498	February 8, 2022

W4 _L_model_backward_Why do we need dA0

Related topics