The confusion in Backpropagation Intuition vedio(Neural Networks and Deep Learning)

wanghai673 · August 15, 2025, 6:00am

in the grad descent of mutiple samples, i think all formulas should have 1/m, not only dw,db.

are there any problems here?

paulinpaloalto · August 15, 2025, 2:34pm

It’s a good question, but that is not a mistake. The point is that you have to be careful to keep track of which of the gradient values are derivatives of L, the vector loss, and which are derivatives of J, the scalar cost which is the average of L over the samples. In the way that Professor Ng formulates this, the only gradients that are derivatives of J are the dW and db gradients. All the others are of L. So it is only the dW and db values that have the factor of 1/m.

This question has come up a number of times before. Here’s a thread that links to multiple earlier discussions on this point.

rmwkwok · August 16, 2025, 1:23am

Hello @wanghai673

Note that if you give dZ^{[2]} a 1/m, then dW^{[2]} will end up having two 1/m which is a problem.

I would like to just extend a bit from Paul’s excellent answer: Paul emphasizes on the difference between J and L because, by our definition, J is the cost over all samples whereas L is that of a single sample. Therefore, when we say it is L, such as

, we don’t need 1/m because each element in the matrix (or array) of dZ^{[2]} is only about one sample.

Cheers,
Raymond

wanghai673 · August 16, 2025, 3:27am

OK,I see,thanks

I think it’s a very good design because it can ensure the consistency of the formula structure for updating all parameters.

another approach is only divide dz[last] by m, and rest of formulas don’t have m but it’s not very cosistent.

Topic		Replies	Views
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning coursera-platform	1	511	July 18, 2022
Course 1 - Week 4 - 1/m in backpropagation Neural Networks and Deep Learning coursera-platform	18	730	February 1, 2026
Dividing by "m" in back propagation using vectorized implementation Neural Networks and Deep Learning week-module-3 , coursera-platform	3	506	February 19, 2024
Week 3,4: Why isn't 1/m part of dz^[L]? Neural Networks and Deep Learning coursera-platform	19	1370	December 6, 2022
Derivation of formula for dZ[2] Neural Networks and Deep Learning coursera-platform	2	613	May 19, 2023

The confusion in Backpropagation Intuition vedio(Neural Networks and Deep Learning)

Related topics