Question 9 asks for various matrix dimensions in neural networks, weight Matrix among them.
Now if Z = W(T).X + b
and of the dimension of X are (n,m)
then W(T) would have to be (*,n) -ignore the , the important thing is n -
which means that W is (n,).
In the lecture, as well as in Prof. Andrew’s explanation about dimensions, he refers to WT as W but in his lectures it is WT.
The transpose on w is only necessary in Logistic Regression (Week 2), where w is a vector instead of a matrix. Here in Week 3, the whole point is that we have graduated to real Neural Networks and now the weights are matrices, not vectors. Prof Ng chooses to define the matrices so that the transpose is no longer required. In the case of Logistic Regression, he could have chosen to make w a row vector and avoided the transpose, but he likes to use the convention that all standalone vectors are column vectors.
Hello dear super mentor, I’d like to extend some thoughts regarding this topic. Is it right to say, it is not important at all when it comes to reality how we define the shape of weight matrix (it’s just a matter of transpose or not during calculation). Only if we correctly figured out the number of features (inputs) and number of output, while following the matrix multiplication rule, everything just works.
It’s just personal preference to define weight matrix as column vectors stacked horizontally or just row vectors stacked vertically.
Regarding to the current different frramworks for deep learning, the convention or specificity of defining the shape of weight matrix can be read from their docs.
What I said as above is correct?
Yes, the point is that all these things are just choices that you have to make. What we are really doing here is translating mathematical formulas into linear algebra operations and then translating those into code. There are different ways to order the data, but as long as you are consistent you can make it work either way. So you could say that it is “a matter of taste”. You’re also correct that once we get to using TensorFlow in DLS Course 2 Week 3, you’ll find that TF uses a different orientation of the X input values, which causes the weights and other quantities to be arranged differently than Prof Ng is doing here in DLS C1.