Dividing by "m" in back propagation using vectorized implementation

paulinpaloalto · February 19, 2024, 5:32pm

Right! The key point is that Prof Ng’s notation for the gradients is a bit ambiguous. You need to remember what the “numerator” is on the partial derivative. E.g.:

dW^{[1]} = \displaystyle \frac {\partial J}{\partial W^{[1]}}

Since J is the mean of L, the vector loss across m samples, then of course that gradient will include the factor of \frac {1}{m}.

But all the gradients other than dW and db values are not partial derivatives w.r.t. J, but something else. They are just Chain Rule factors that we need to compute in order to get the final dW and db gradients which are the ones we really care about, because they actually get used to update the parameters.

For example in the case you mentioned, the gradient for dZ^{[1]} is:

dZ^{[1]} = \displaystyle \frac {\partial L}{\partial Z^{[1]}}

As I mentioned just above, L is a vector quantity with one element for each of the m samples. We haven’t yet taken the average when computing dZ^{[1]}, so there is no factor of \frac {1}{m}.

This topic has come up quite a few times before. Here’s an earlier thread about it.

The other high level point here is that this course is specifically designed not to require calculus as a prerequisite. That’s the good news, but there is accompanying bad news: that means you just have to accept the formulas as Prof Ng gives them to us. Showing the derivations requires that you know multivariate and vector calculus. Here’s a thread with links to more material on this if you have the math background and really want to understand how all the formulas are derived.

Topic		Replies	Views
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning	1	498	July 18, 2022
Vectorizing Logistic Regression's Gradient Output - why no 1/m? Neural Networks and Deep Learning	2	408	July 18, 2023
C4W1 CNN back propagation Convolutional Neural Networks	1	618	November 2, 2021
Question about derivative formula Neural Networks and Deep Learning week-4	3	14	September 22, 2024
Derivation of formula for dZ[2] Neural Networks and Deep Learning	2	591	May 19, 2023

Dividing by "m" in back propagation using vectorized implementation

Related topics