W Transpose: inconsistent definition/notation results in much confusion!

It is confusing when the matrix W is first referred to as already transposed:


but later on it seems that the definition of the matrix W has changed in the sense that W is not transposed, and needs to be transposed, and thus the superscript “T” is used:

In the first slide, W has dimensions (4,3), which means, it is already transposed, since a single example X has the dimensions (3, 1) and thus, every row vector in W must have 3 elements/parameters, each for each of X.

However, in the next video, the superscript “T” is used, which is confusing. Has the matrix W become (3,4) again, so that we need to transpose it?

It becomes even more confusing in the quiz, when it is not clear which version of W (transposed or not) is assumed. So a note in the lecture page would help.

1 Like

Yes, the notations are a bit confusing. I guess this confusion is just an example of the difference between mathematics and machine learning engineering. In the second video, Andrew mentioned that w_1^{[1]} is a vector, and that’s why he transposed it to get the dot product with x. But then in the video (~5:40), he said stacking the rows we get the 4 by 3 matrix w^{[1]}, so I guess w in the second video is still 4x3.


So maybe the notation (w_1^{[1]}) does not actually mean one row of the matrix w^{[1]}? :thinking:

1 Like

@Oleksandra_Sopova Thank you for writing out an explanation of the issue; this was a huge source of confusion for me while I was following along with the DLS C1Wk3 “Computing a Neural Network’s Output” video. But at the end of that video it all came together. IMO the order of the teaching just needs a small tweak.

In the “Neural Network Representation”, which comes first, W (capitalized, matrix) is used without definition, so I assumed it was a horizontal stack of all the w (lowercase, vectors) column vectors. But that assumption was wrong!

In the next video, “Computing a Neural Network’s Output”, W is defined as being the vertical stack of w vectors transposed into row vectors and that is the definition that holds for all the labs and quizzes. So W is in fact, already transposed.