W Transpose: inconsistent definition/notation results in much confusion!

Oleksandra_Sopova · October 23, 2022, 7:28am

It is confusing when the matrix W is first referred to as already transposed:

but later on it seems that the definition of the matrix W has changed in the sense that W is not transposed, and needs to be transposed, and thus the superscript “T” is used:

In the first slide, W has dimensions (4,3), which means, it is already transposed, since a single example X has the dimensions (3, 1) and thus, every row vector in W must have 3 elements/parameters, each for each of X.

However, in the next video, the superscript “T” is used, which is confusing. Has the matrix W become (3,4) again, so that we need to transpose it?

It becomes even more confusing in the quiz, when it is not clear which version of W (transposed or not) is assumed. So a note in the lecture page would help.

kchong37 · October 24, 2022, 1:25am

Yes, the notations are a bit confusing. I guess this confusion is just an example of the difference between mathematics and machine learning engineering. In the second video, Andrew mentioned that w_1^{[1]} is a vector, and that’s why he transposed it to get the dot product with x. But then in the video (~5:40), he said stacking the rows we get the 4 by 3 matrix w^{[1]}, so I guess w in the second video is still 4x3.

So maybe the notation (w_1^{[1]}) does not actually mean one row of the matrix w^{[1]}?

Jessica-G · April 27, 2024, 4:35pm

@Oleksandra_Sopova Thank you for writing out an explanation of the issue; this was a huge source of confusion for me while I was following along with the DLS C1Wk3 “Computing a Neural Network’s Output” video. But at the end of that video it all came together. IMO the order of the teaching just needs a small tweak.

In the “Neural Network Representation”, which comes first, W (capitalized, matrix) is used without definition, so I assumed it was a horizontal stack of all the w (lowercase, vectors) column vectors. But that assumption was wrong!

In the next video, “Computing a Neural Network’s Output”, W is defined as being the vertical stack of w vectors transposed into row vectors and that is the definition that holds for all the labs and quizzes. So W is in fact, already transposed.

Topic		Replies	Views
W3_Video Lecture 2/3_Matrix sizes and operations Neural Networks and Deep Learning coursera-platform	6	575	April 5, 2023
Questions of Week 3 Quiz Neural Networks and Deep Learning coursera-platform	10	632	October 28, 2022
Ambiguity regarding weight matrix in Graded Quiz - Week 3 Neural Networks and Deep Learning coursera-platform	4	544	November 9, 2023
C1_General Question_Dimensions of W_ from week 2_to_ week 4 Neural Networks and Deep Learning coursera-platform	3	512	October 28, 2022
Confusing about structure of matrices X and W Neural Networks and Deep Learning coursera-platform	3	1031	November 23, 2022

W Transpose: inconsistent definition/notation results in much confusion!

Related topics