Back propagation 1 box

kntvrl · May 29, 2024, 8:40am

Hello,
At the backprop part of the diagram we first compute derrivative of loss function (dL/dA). And than at the first backprop box 1,
we calculate dZ = dA[2] * g’(Z[2]) and send it to the next box.
At the next box we calculate dW[1] = dZ[2] * A[1]T, db=np.sum(dZ[2]) and dA[1]=WTd[2] …and so on. So my question is why the output of the 1 box is dL/dA[2]. Shouldnt it be dL/dZ[2] ? Also at the programming side we are sending variables like that. I mean we are sending dZ[2] to the next function.

Muhammad_John_Abbas · May 29, 2024, 9:07am

I understand your concern! You’re right to question this, and it’s a great observation. The output of the first backprop box should indeed be dL/dZ[2], not dL/dA[2]. The derivative of the loss function with respect to the activation A[2] is not what we need to pass on to the next layer. Instead, we need to compute the derivative with respect to the weighted sum Z[2], which is the input to the activation function.
Think of it like this: we’re trying to measure how much each parameter contributes to the final error. At each layer, we need to compute the error gradient with respect to the inputs of that layer, not the outputs. So, in this case, we need dL/dZ[2] to compute the error gradients for the weights and biases of the second layer. By passing dL/dZ[2] to the next box, we can then compute dW[1] and db[1] correctly.
I hope this clears up any confusion, and please let me know if you have further questions!

Best Regards,
Muhammad John Abbas

kntvrl · May 29, 2024, 10:08am

Isnt it like that? Sorry for very bad drawing also thank you for your reply.

saifkhanengr · May 29, 2024, 1:46pm

The output of the box you shared is correct. Regarding your question:

So my question is why the output of the 1 box is dL/dA[2]. Shouldnt it be dL/dZ[2]

We have a chain rule:

\frac{dL}{dZ2} = \frac{dL}{dA2} \times \frac{dA2}{dZ2}...

So, instead of passing the \frac{dL}{dA2} and \frac{dA2}{dZ2} separately, we just pass \frac{dL}{dZ2} that is \frac{dL}{dA2} \times \frac{dA2}{dZ2}, hence it counts the effect of both.

Topic		Replies	Views
Please explain $dz^{[1]} = {W^{[2]}}^{T} dz^{[2]} \times {g^{[1]}}^{'}(z^{[1]})$ in backpropogation Neural Networks and Deep Learning week-module-3 , coursera-platform	3	42	November 3, 2024
Post Activation Gradient – Week 4 Neural Networks and Deep Learning coursera-platform	2	532	April 22, 2022
Backprop derivative after output later - Course 1: Week 3 Neural Networks and Deep Learning coursera-platform	4	608	June 22, 2024
Confusion about Calculating dZ^[l] Neural Networks and Deep Learning coursera-platform	3	810	October 26, 2022
W4_A1_Video Lecture on Forward & Backward functions Neural Networks and Deep Learning coursera-platform	4	548	January 15, 2023

Back propagation 1 box

Related topics