Week 2 - Vectorizing Log Reg Grad Out - dw computation

Vangelis_Mathioudis · December 2, 2021, 1:47pm

Hello everyone,

I would like to ask. In this lecture at 05:05 the dw computation shouldn’t be like this:

dw = \frac{1}{m} *dz^TX to be inline with the computation on the next line? Or it should be like dw = \frac{1}{m} *X^Tdz^T ?

If any of the above is correct, then could you please explain the computation?

paulinpaloalto · December 2, 2021, 5:21pm

To understand this, you need to carefully track the shapes of the various objects which are shown in the way the matrices and vectors are drawn out on that slide. The first thing to be careful about is that on the left side of the whiteboard, the computations are handling one sample at a time (they use lower case z), but on the right side it is handling all the samples at once (vectorized, which is the whole point here) and it uses capital Z.

Here are the dimensions:

X is n_x x m, where n_x is the number of features and m is the number of samples.

dZ is the gradient of Z, so it is 1 x m.

dw is the gradient of w so it has the same dimensions as w which are n_x x 1.

The gradient formula as Prof Ng gives it is this:

dw = \displaystyle \frac {1}{m} X \cdot dZ^T

Note that I added the “dot product” operator there just to be clear. The notation Prof Ng uses is that he just writes the operands adjacent with no explicit operator when he means “dot product” style multiply. When he wants to write “elementwise” multiply, he uses “*” to indicate that.

So dZ^T will be m x 1 and the dimensional analysis on Prof Ng’s formula is:

n_x x m dotted with m x 1 gives n_x x 1 result, which is the correct dimensions for dw.

If you try your versions, the dimensions do not match for a dot product:

dZ^T \cdot X would be m x 1 dotted with n_x x m. That doesn’t work.

X^T \cdot dZ^T would be m x n_x dotted with m x 1 and that doesn’t work either.

Vangelis_Mathioudis · December 2, 2021, 7:45pm

Thank you for your reply, sir. I agree with everything you have written. I handwrote it and concluded on the same.

Topic		Replies	Views
Vectorized form for dw in week2 Neural Networks and Deep Learning week-2 , coursera-platform	1	11	May 1, 2025
Explanation vectorization gradient descent Neural Networks and Deep Learning week-2 , coursera-platform	14	113	September 11, 2024
Difference on the calculation of dw between week2 and week3 Neural Networks and Deep Learning week-2 , week-3 , coursera-platform	1	18	October 9, 2024
The dimensions of dW Neural Networks and Deep Learning week-3 , coursera-platform	4	35	February 6, 2025
Week 3, "Gradient Descent for Neural Networks" Neural Networks and Deep Learning week-3 , coursera-platform	10	472	March 25, 2024

Week 2 - Vectorizing Log Reg Grad Out - dw computation

Related topics