Question about derivative formula

yuyang187 · September 22, 2024, 10:09am

Can I ask why the vectorized implementation for dw = dz* a.T has (1/m), the left hand side formula does not have 1/m, but the vectorized version has 1/m, is there any thing I missed, the two sides do not seem equivalent.

saifkhanengr · September 22, 2024, 10:17am

Both are correct but 1/m is just a normalizing technique: divided by the total number of samples.

yuyang187 · September 22, 2024, 10:19am

Thanks for the reply. so does it mean they are both “usable” as derivative for backpropgation, even they are not exactly same?

saifkhanengr · September 22, 2024, 10:35am

Yes.

But the main point to use 1/m is to make the numbers small, hence making the later calculation easy. You can try without that averaging term but large number → large calculation → difficult to do → more time, resources, and memory required.

Topic		Replies	Views
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning coursera-platform	1	498	July 18, 2022
Dividing by "m" in back propagation using vectorized implementation Neural Networks and Deep Learning week-3 , coursera-platform	3	462	February 19, 2024
dA derivation; where does the 1/m term go? Neural Networks and Deep Learning week-2 , coursera-platform	6	20	January 1, 2025
Course 1 - Week 4 - 1/m in backpropagation Neural Networks and Deep Learning coursera-platform	12	668	April 29, 2024
Why use `average` when vectorizing the backpropagation calculations(C1_W4, page17) Neural Networks and Deep Learning coursera-platform	3	373	August 17, 2023

Question about derivative formula

Related topics