W2 A2 |Possible inaccuracy while doing L R gradient descent implementation

Davide_Cividino · October 9, 2022, 11:07am

Hi! I just want to point out a possible inaccuracy in the explanation of the implementation of the logistic regression gradient descent algorithm. The step I’ve highlighted in red in the image to calculate the vector dw should be a matrix multiplication between the matrix X (n,m) and the column vector dZ.T (m,1). Hence, to be consistence with the notation used in the video it should be replaced by
dw = 1/m np.dot( X, dz.T ).
Otherwise, implementing the simple multiplication X * dZ.T in python, the broadcasting of the column vector dZ.T would be applied, resulting in a (n,m) matrix instead of a row vector (1,n).

I think the notation can create problems in particular when moving to the implementation in python where the result of np.dot() and * are very different.

Thank you a lot in advance!

Kic · October 9, 2022, 11:19am

Hi @Davide_Cividino,

Thank you for highlighting this. For matrix multiplication, it is the dot product operation. dw =1/m Xdz^T is written in a mathematics expression.

juansoliscas · October 9, 2022, 11:19am

Hi Davide!!
Welcome to our community.
Thanks for sharing your comments with us.

Davide_Cividino · October 9, 2022, 11:35am

Thanks for the quick reply! Yes yes, just I just think it is a bit misleading wrt a couple of lines above when instead the pseudocode notation is used ( np.dot() ). If I use the slide as a pseudocode reference to implement the code I could get tricked by the two different notations in the same slide. I imagine changing this is a lot of work, just wanted to point it out

juansoliscas · October 9, 2022, 11:54am

Hi @Davide_Cividino, in order with your comment, I can tell you that in dw=1/mXdZT as you said is a vector (1,n) on the right column. In the left column, dw=x(i)*dz(i) as is into a for loop is the simplest multiplication that is added to dw1 sum.

Regards!

paulinpaloalto · October 9, 2022, 3:14pm

You just have to be clear about the notational conventions that Prof Ng uses. He will always explicitly use “*” when he’s writing a mathematical expression and he means elementwise multiply. If there is no explicit operator, he always means “dot product”. Here’s a thread which discusses that in more detail.

Of course you also have to be conscious of whether he’s writing math or python. The two are different in many ways. E.g. this:

s(1 - s)

means something completely different in math than it does in python. If you write that in python, it means that s is a function and you are invoking it with the argument 1 - s. That will not end well.

Notice that earlier in that column he writes the mathematical expression and the python expression for the linear activation and the former is:

Z = w^TX + b

Then you see the np.dot when he does you a favor and writes the same thing in python. In the later expression you point out, he does not write the python only the math, as Kin pointed out earlier.

Topic		Replies	Views
C1_W2_Programming_assignment (logistic regression with a neural network mindset) Neural Networks and Deep Learning coursera-platform	13	487	November 10, 2023
W2_A2_Gradient Descent Neural Networks and Deep Learning coursera-platform	9	596	November 6, 2022
4.3 - Forward and Backward propagation Neural Networks and Deep Learning coursera-platform	1	541	March 8, 2022
Explanation vectorization gradient descent Neural Networks and Deep Learning week-module-2 , coursera-platform	14	114	September 11, 2024
Week 2 - Vectorizing Log Reg Grad Out - dw computation Neural Networks and Deep Learning coursera-platform	2	534	December 2, 2021

W2 A2 |Possible inaccuracy while doing L R gradient descent implementation

Related topics