W Transpose: inconsistent definition/notation results in much confusion!

It is confusing when the matrix W is first referred to as already transposed:


but later on it seems that the definition of the matrix W has changed in the sense that W is not transposed, and needs to be transposed, and thus the superscript β€œT” is used:

In the first slide, W has dimensions (4,3), which means, it is already transposed, since a single example X has the dimensions (3, 1) and thus, every row vector in W must have 3 elements/parameters, each for each of X.

However, in the next video, the superscript β€œT” is used, which is confusing. Has the matrix W become (3,4) again, so that we need to transpose it?

It becomes even more confusing in the quiz, when it is not clear which version of W (transposed or not) is assumed. So a note in the lecture page would help.

Yes, the notations are a bit confusing. I guess this confusion is just an example of the difference between mathematics and machine learning engineering. In the second video, Andrew mentioned that w_1^{[1]} is a vector, and that’s why he transposed it to get the dot product with x. But then in the video (~5:40), he said stacking the rows we get the 4 by 3 matrix w^{[1]}, so I guess w in the second video is still 4x3.


So maybe the notation (w_1^{[1]}) does not actually mean one row of the matrix w^{[1]}? :thinking:

1 Like