Transpose of the weight matrix

In Week 3, in the lecture titled “Vectorizing across multiple examples”, I am confused about the calculation of Z[1]. Why is W[1] not transposed?

Z[1] = W[1]*X + b[1]

When we learnt about vectorizing logistic regression in Week 2, we transposed the weight matrix.

Hello @avinash567,

In week 2, we calculate \hat{y} = \sigma(\mathbf{w}^T \mathbf{x} + b), where the dimension of \mathbf{w} is n_x \times 1 and the dimension of \mathbf{x} is n_x \times 1; hence, you need to transpose \mathbf{w} to calculate the dot product between \mathbf{w} and \mathbf{x} using matrix multiplication.

As an exercise, what are the dimensions for \mathbf{W}^{[1]} and \mathbf{X} in week 3? Given my explanation, why is it not necessary to transpose \mathbf{W}^{[1]}, given how it has been defined?

1 Like

Ive also found the matrix representations only ‘loosely’ consistent ( = confusing! ) . I found a slide that explicitly stated W ( 2-D version with all the nodes ) equals the a bunch of the TRANPOSED 1-D w’s , implying we cant rely on the capitalized and lower-case version to maintain their dimension assignment.
While Here, I’ll add that For people like me coming from Matlab, its hard knowing if seeing “J.K” is supposed to mean an inner (dot) product, or an element-wise product (after any implied broadcasting ) .


A very good exercise is to derive the formulas with pen and paper by yourself.

From my own derivations, given

we define

which leads to


I guess Im wondering why it is derived from wij instead of wji ( sorry dont know how to type subscripts ij here) , so it doesnt need transposing to preserve implied
matrix representation?

You can certainly do that; however, then you have to change the formula I typed down.

In your case, the weight matrix will have the dimension n \times m, so you will need to transpose \mathbf{a}^{[l - 1]} instead, since it is n \times 1 and you also need to change the order of your matrix multiplication so it ends up \mathbf{z}^{[l]} = {\mathbf{a}^{[l - 1]}}^T \mathbf{W}^{[l]}.

Again, try it out with pen and paper, both versions. Which version do you prefer after performing both forward and back prop using pen and paper (and vectorization)?


@dave_merit: The definition of the w weight vector in Logistic Regression and the W weight matrices in Neural Networks is completely arbitrary. Prof Ng gets to define them however he wants. What he chooses to do is to use the convention that all standalone vectors are formatted as column vectors, so that w is a column vector. Given the way he chooses to lay out the X sample matrices as n_x x m, you then require a transpose on w to do the linear combination.

In the case of the W matrices, he chooses to orient them differently and the transpose is no longer required.

To include LaTeX expressions, just bracket them with single dollar signs. This is covered on the FAQ Thread, q.v.

As to when to use elementwise versus dot product, notice that Prof Ng is consistent in using * to indicate elementwise when he is writing mathematical expressions. If he writes two operands adjacent with no explicit operator, he means real dot product style matrix multiplication. I think this latter choice is a bit unfortunate, but he’s the boss. E.g. I like to use the LaTeX \cdot operator, as in:

Z = w^T \cdot X + b

just to make things explicit.

As to the annoyance of switching from MATLAB to python + numpy, I feel your pain. MATLAB is beautiful and the polymorphism with which vectors and arrays are handled is designed in from scratch. Python wasn’t originally designed to do vectorized calculations, so everything to do with numpy feels like a bag on the side of a kludge by comparison to the beauty of MATLAB. But the world of ML/DL/AI has made this decision for us and we just have to deal with it. Sorry!