Cant understand a matrix

It sounds like you are asking about what is shown in this screenshot from the Week 3 lecture “Computing A Neural Network’s Output” at about 4:50 into that lecture:

What is happening there is that Prof Ng is starting with the individual equations for the output of each neuron and he uses the same format for everything that he does in the Logistic Regression case. So for example he shows:

z_1^{[1]} = w_1^{[1]T} \cdot x + b_1^{[1]}

Note that I’ve added one little “extra” there by making the dot product operation explicit between the w vector and the x vector.

So in that formulation, the vector w_1^{[1]} is the weight vector for the first neuron in the first layer (that’s the exponent [1] everywhere there). He formats w_1^{[1]} as a column vector, just as he did in the Logistic Regression case. It has dimension n_x x 1, where n_x is the number of input features (elements in each input x vector). x is also a column vector n_x x 1, so in order to get that dot product to work, we need to transpose the w vector. So dotting 1 x n_x with n_x x 1 gives you a 1 x 1 or scalar output.

Then what he does it to put all the weight vectors for the output neurons of layer 1 together into a single matrix, so that we can compute the outputs all at once in a vectorized way. But he also wants to make it simpler, so that we don’t need any more transposes on the whole W^{[1]} weight matrix. So he uses the w vectors in the transposed form, so that they are now row vectors 1 x n_x. That means he can stack them up as the rows of the weight matrix W^{[1]}. That’s what he is showing in the lower left section of that diagram.

So you end up with W^{[1]} having the dimensions n^{[1]} x n_x, where n^{[1]} is the number of output neurons in layer 1. And because of the fact that the w vectors from the upper right formulation are now the rows of W^{[1]}, the full vectorized forward propagation becomes:

Z^{[1]} = W^{[1]} \cdot X + b^{[1]}

Where X there is the full sample matrix with each column being one input vector. So if you have m samples, then Z^{[1]} is n^{[1]} x m. Then we apply the activation function “elementwise” to get A^{[1]} so it has the same dimensions as Z^{[1]}.

I didn’t mention the bias values there, but there is one scalar b value for each output neuron. In the final vectorized form, you also “stack” those into a column vector of dimension n^{[1]} x 1. So when you add that vector, it is “broadcast” and adds to each column of the output to compute the final Z^{[1]}.

3 Likes