Hi,
I was just attempting the assignment for Week 3. In the exercise under forward propagation when computing the value of Z, I initially tried np.dot(W1.T, X) + b and didn’t get it right but when I removed the transpose I am getting the right result. During the theory sessions it was mentioned that since both X and W are column vectors, W needs to be set as transpose in order to carry out an inner product. Now I am confused. Can anyone explain this please?
Let’s say we have these two parts for this course - logistic regression & neural networks.
For logistic regression, both X and W are column matrices, so we need to make the left matrix a row matrix for it to multiply with the right column matrix.
For neural network, the convention is a bit different. Both X and W are now not column matrices. Instead -
W takes the shape of (a, b) where a is the number of neurons in this layer and b is either (i) the number of neurons in the last layer or (ii) the number of features if this is the first layer
X takes the shape of (c, d) where c is the number of features and d the number of samples.
You see, with such arrangement, we can do the multiplication of WX right the way without transpose because b = c.
After the logistic regression part, we will stick to the neural network convention for the rest of the course and the coming other courses because they are all about neural networks.