Week 3,4: Why isn't 1/m part of dz^[L]?

paulinpaloalto · October 25, 2022, 3:00pm

Sorry, but this is not a typo. The reason you think that is that Prof Ng’s notation is slightly ambiguous. You need to keep track of what the “numerator” is on the partial derivative term. Note that:

dA = \displaystyle \frac {\partial L}{\partial A}

But for dW and db the derivatives are of the scalar cost J:

dW = \displaystyle \frac {\partial J}{\partial W}

Of course J is the average of the vector quantity L over the samples, so that’s where the factor of \displaystyle \frac {1}{m} comes in.

The way Prof Ng structures everything here, it is only the “final” gradients that we actually are going to apply that are derivatives of J. All the rest are just Chain Rule factors. The only “final” gradients are those of W^{[l]} and b^{[l]}.

Here’s another thread that discusses these issues.

Topic		Replies	Views
Week 3: wrong formula for the derivatives dZ[2] in videos and notebook Neural Networks and Deep Learning coursera-platform	4	786	August 20, 2022
Exercise 6 - backward_propagation in Programming Assignment Week 3 Neural Networks and Deep Learning coursera-platform	8	697	October 27, 2022
Why dA[L] has no 1/m in week 4? Neural Networks and Deep Learning coursera-platform	5	541	July 19, 2021
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning coursera-platform	1	498	July 18, 2022
Optional video explaining backpropagation of C1 : dL/dZ[2] = A[2]- y? Neural Networks and Deep Learning coursera-platform	4	501	August 18, 2023

Week 3,4: Why isn't 1/m part of dz^[L]?

Related topics