Why use `average` when vectorizing the backpropagation calculations(C1_W4, page17)

Haoting_Wang · August 17, 2023, 3:37am

Hi, in C1_W4 lecture notes page 17: I don’t understand why using average to calculate dW and db for m examples. Could someone explain? Thanks!

paulinpaloalto · August 17, 2023, 3:41am

Because the gradients of W and b are gradients (partial derivatives) of the cost J and the definition of J is that it is the average of the loss values L across all the samples. L is a vector quantity and J is a scalar.

Of course the other thing to remember is that the derivative of the average is the average of the derivatives. Think about it for a second and that should make sense.

paulinpaloalto · August 17, 2023, 3:43am

And if the next question is, ok, then why are the other gradients not averages? It is because everything in all those formulas other than dW and db are just “chain rule” factors used to compute dW and db, so they aren’t derivatives of J, but are derivatives of other things.

Haoting_Wang · August 17, 2023, 3:44am

Thanks! Got it now!

Topic		Replies	Views
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning	1	498	July 18, 2022
Derivation of formula for dZ[2] Neural Networks and Deep Learning	2	591	May 19, 2023
Dividing by "m" in back propagation using vectorized implementation Neural Networks and Deep Learning week-3	3	458	February 19, 2024
week-4-Backpropagation Neural Networks and Deep Learning week-4	8	25	November 16, 2024
Question about derivative formula Neural Networks and Deep Learning week-4	3	14	September 22, 2024

Why use `average` when vectorizing the backpropagation calculations(C1_W4, page17)

Related topics