Derivation of formula for dZ[2]

Samir_Arora · May 19, 2023, 10:44pm

Why we do not have (1/m) in the formula of dZ[2]. Where did I make a mistake in this derivation?

paulinpaloalto · May 19, 2023, 11:22pm

Prof Ng’s notation for the gradients is a little ambiguous. It turns out that only the final gradients that we actually apply, which is to say dW and db, are actually gradients of J. All the others are either gradients of L (the vector loss) or simply Chain Rule factors used to compute dW and db.

Of course we know that by definition:

J = \displaystyle \frac {1}{m}\sum_{j = 1}^m L(y^{(j)},\hat{y}^{(j)})

Meaning that J is the average of the L values across the samples in the batch. If you think about it for a second, you’ll see that the derivative of the average is the average of the derivatives. So the factor of \frac {1}{m} only appears in the final gradients of W and b.

Here’s another thread which discusses this in more detail.

Here’s another thread about this and here’s yet another.

Samir_Arora · May 19, 2023, 11:51pm

Oh so its a derivative with respect to L. Got it. Thank you so much. That makes a lot of sense. All my derivations are working now. Thank you so much.

Topic		Replies	Views
Week 3: wrong formula for the derivatives dZ[2] in videos and notebook Neural Networks and Deep Learning	4	783	August 20, 2022
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning	1	498	July 18, 2022
Where does the (1/m) come from mathematically? Neural Networks and Deep Learning	6	464	July 25, 2023
Optional video explaining backpropagation of C1 : dL/dZ[2] = A[2]- y? Neural Networks and Deep Learning	4	500	August 18, 2023
W3_Vectorization of dZ[2] equations Neural Networks and Deep Learning	5	558	March 31, 2023

Derivation of formula for dZ[2]

Related topics