week-4-Backpropagation

Krishna39 · November 15, 2024, 5:37pm

Can anyone xplain why dW,db mean the gradients of cost function J whereas dA is the gradient of loss function l w.r.t A? Why dJ/dA is not calculated

dAL here is just dL/dAL. If it was dJ/dAL then where is 1/m

SNaveenMathew · November 15, 2024, 5:42pm

Your neural network parameters are W and b. A is produced in the intermediate layers by the neural network when W, b and x are known. Gradient with respect to A only helps when we apply chain rule. Otherwise the training process updates only W and b, which in turn update A.

Krishna39 · November 15, 2024, 5:45pm

Yes thats true but i am asking why gardient dAL is considered as dL/dAL (derivative of loss funtion with AL)? as dW is dJ/dW(derivative of cost function w.r.t W)

SNaveenMathew · November 15, 2024, 5:55pm

\frac{\partial L}{\partial W^{[l-1]}} = \frac{\partial L}{\partial A^{[l-1]}}\frac{\partial A^{[l-1]}}{\partial W^{[l-1]}}

\frac{\partial A^{[l-1]}}{\partial W^{[l-1]}} is easy to compute - therefore we calculate it for backprop in multi layer neural networks.

Krishna39 · November 15, 2024, 9:06pm

thanks for taking your time…i think you did understood my question…none the less i guess i made some deduction…and anyone can correct me on this:

As dW=dJ/dW=1/mdZA.T here the dot product here indeed performs summation of training examples and that is why dA is computed for each loss function of data set and substituted in dZ in dJ/dW. (I am bad at conveying)

paulinpaloalto · November 15, 2024, 9:33pm

It’s a good question that has come up before. Here’s an earlier thread that discusses the same points.

The point is that most of the formulas Prof Ng shows are for “layer” level Chain Rule factors and the \frac {1}{m} only comes in when you finally put all the Chain Rule factors together to compute the actual gradients of the weight or bias values. You could have structured things differently, but you need to make sure you don’t end up with multiple factors of \frac {1}{m}.

Of course computing that last factor \displaystyle \frac {\partial J}{\partial L} is easy: the gradient of the average is the average of the gradients. Think about it for a second and that should make sense.

Krishna39 · November 16, 2024, 8:42am

thanks for the explanation. So can you correct me if i am wrong.
So as dW=dJ/dW==1/mdZA.T involves summation over all training examples and in which dZ=𝑑𝑍[𝑙]=𝑑𝐴[𝑙]∗𝑔′(𝑍[𝑙]).(11) invloves the term dA. Then we basically compute dA=dL/dA for each training example which indeed will be substituted in dJ/dW which involves summation over all training examples(so we dont end up in multiple 1/m factors)…

Krishna39 · November 16, 2024, 8:48am

I had one more doubt… so will building a neural network first we have to deduct
the gradients of the parameters(vectorized) of the model by beforehand (manually)based on the cost function we choose and activations. That is as in this course we mostly used cross entropy loss and repectively we found the gradients . What if we had another loss function the whole gradient formulaes changes right?

SNaveenMathew · November 16, 2024, 9:36pm

The model parameters are not optimized to predict the outcome accurately. So we measure a loss to quantify the difference between the actual and predicted values of y. In order to (for the lack of better words) “correct the incorrect parameters W and b” (to make the predictions closer to the actual y), we update the model parameters using the gradients.
Yes, the gradient depends on the explicit form of the loss. Eg: if we used hinge loss instead of cross entropy loss the mathematical formula of the gradient will be different.

Topic		Replies	Views
Backpropagation formulas Neural Networks and Deep Learning coursera-platform	7	1047	April 21, 2021
Backpropagation week 3 vs week 4 Neural Networks and Deep Learning coursera-platform	4	557	August 5, 2022
Back propagation 1 box Neural Networks and Deep Learning week-4 , coursera-platform	3	128	May 29, 2024
Intuition for why we calculate dz? Neural Networks and Deep Learning coursera-platform	3	552	April 6, 2022
Course 1 - Week 4 - 1/m in backpropagation Neural Networks and Deep Learning coursera-platform	12	668	April 29, 2024

week-4-Backpropagation

Related topics