Vectorizing Logistic Regression's Gradient Output - why no 1/m?

Fizay-Noah_Lee · July 18, 2023, 6:50pm

Hello my question is about vectorizing backpropagation, for say logistic regression.
We are trying to minimize the Cost function J, which is the average of the losses over all m examples. So the cost function, because of the averaging, has a factor of 1/m. However, when we compute say dz^(1) as in the picture below, why isn’t there a factor of 1/m? i.e. why isn’t dz^(1) equal to (1/m)(a^(1)-y^(1)) ?

TMosh · July 18, 2023, 10:59pm

In general, since m is a constant, it really doesn’t matter if you include the 1/m in the gradients. It’s just going to re-scale the gradients by a constant amount, and that can be compensated for with a different learning rate.

rmwkwok · July 18, 2023, 11:12pm

Hello @Fizay-Noah_Lee,

The 1/m is on the right hand side of the slide:

and it continues to show up in the next slide:

Note that dz is not the final gradient result, instead it is just an intermediate variable that we use to compute the final db and dw. db and dw are the gradients we need for gradient descent and the 1/m is never missed out.

Cheers,
Raymond

Topic		Replies	Views
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning coursera-platform	1	498	July 18, 2022
Dividing by "m" in back propagation using vectorized implementation Neural Networks and Deep Learning week-module-3 , coursera-platform	3	462	February 19, 2024
dA derivation; where does the 1/m term go? Neural Networks and Deep Learning week-module-2 , coursera-platform	6	20	January 1, 2025
Vectorizing Logistic Regression's Gradient Output - vectorizing dw Neural Networks and Deep Learning coursera-platform	1	453	June 25, 2023
BackPropagation Neural Networks and Deep Learning coursera-platform	1	628	June 15, 2021

Vectorizing Logistic Regression's Gradient Output - why no 1/m?

Related topics