I understand that W is the matrix holding weights, but how is W stuctured? Are the rows each training examples, and then the columns the weight for each feature? (If so, in the example in the screenshot there are only 4 features right?)
Lastly, When you multiply W and X - mechanically what does that represent? are you taking the weight in the 0th row and 0th column and multiplying it by the first training example- and then moving on to the row at index 1 and 0th column of W and multiplying it by the first training example again?
Prof Ng explains that in the lecture from which you show that slide. The rows of W are the weights that produce the output for one neuron at that layer. The way they do that is as a “linear combination” (dot product) with each input sample vector from the previous layer. That’s the way matrix multiplication works, right? Each element of the output is the dot product of one row of the first matrix with one column of the second. So the vector expression of the linear part of forward propagation is this:
Z^{[l]} = W^{[l]} \cdot A^{[l-1]} + b^{[l]}
If you look carefully at how the matrix multiplication between the W and A works there, what I said above should make sense.
It might help to state the shapes and structures of all the objects in that expression:
A^{[l-1]} is the activation output of the previous layer or X in the case of the very first hidden layer. Each column of A or X represents one sample, so the dimension is n^{[l-1]} x m, where n^{[l-1]} is the number of output neurons from layer l - 1 or the number of input “features” in the case of X.
W^{[l]} has dimensions n^{[l]} x n^{[l-1]}, because it takes input from layer l - 1 and produces the output for layer l.
b^{[l]} is the bias vector for layer l. It is a column vector of dimension n^{[l]} x 1.
So you can see the dimensions on the dot product there:
n^{[l]} x n^{[l-1]} dotted with n^{[l-1]} x m will give output that is n^{[l]} x m, so that is the dimension of Z^{[l]} in the above formula.
If that was not clear to you from the lecture, you should probably watch it again carefully with the above in mind.
Also note that familiarity with basic Linear Algebra is a prerequisite here: if you are not already familiar with how “dot product” style matrix multiplication works, you should pause this course and spend some time reviewing Linear Algebra. There are lots of good resources on the web. This thread lists several at different levels of depth.
thanks for the reply Paul! I understand the linear algebra part - but I think what I’m missing is the intuition of understanding the weights: so in your claim that W is a matrix, where each row has dimensions n[1] x n[2]: is each element of the matrix W a weight for a specific feature? If X was an image, then concretely, would each weight be the weight for each pixel?
I guess in summary, I’m still fuzzy what the rows and columns represent in the W matrix.
If you understand matrix multiplication, then you should just work it out with a pencil and paper. Let’s consider the special case of the input layer, since that makes things more concrete. Here is the equation in vector form:
Z = W \cdot X + b
Each column of X is one input sample vector x^{(i)}, right? So it has dimension n_x x 1 (just talking about one input sample) where n_x is the number of “features” or elements in each input sample.
The dimensions are W are n^{[1]} x n_x. Where n^{[1]} is the number of output neurons in layer 1. Now work out this dot product with paper and pencil:
W \cdot x^{[i]}
The output will be another column vector of dimension n^{[1]} x 1. Work out what the first element of that vector will be.