Feeling rather dumb here. Got two questions that I am stuck on for the last 2 days
What is T meant to be here? Does it mean Transpose?
- Parameter description reads
X – data of size (num_px * num_px * 3, number of examples)
I just cant picture this in my mind. Should this not be (number of examples, num_px * num_px * 3)? Or is this why we need to do a transpose?
Yes, the T used as an exponent, e.g. w^T, means transpose. As to the dimensions of X, they could have been done either way, but Prof Ng gets to choose. And he chose features x samples
as the orientation of X. So if the number of features is n_x and the number of samples is m, then the dimensions of X will be n_x x m.
The dimension of the weight vector w is n_x x 1. That is also a choice that Prof Ng has made: he uses the convention that any “standalone” vector is oriented as a column vector.
So with all that information, you can now see why the formula for Z is:
Z = w^T \cdot X + b
The operation between w^T and X is a matrix multiply, so the “inner dimensions” must agree and you can see that they do:
1 x n_x dot n_x x m gives us a result that is 1 x m.
Then when we apply the sigmoid, that is done “elementwise” so that A has the same shape as Z.
1 Like
Thank you for the detailed explanation.
“That is also a choice that Prof Ng has made: he uses the convention that any “standalone” vector is oriented as a column vector.”
I suppose this will be clear later. However I am unblocked as of now. Thanks a lot.
1 Like
I’m just telling you that w is a column vector. It didn’t have to be that way, but that is the way that Prof Ng chooses to define it. Wait until next week where the weights become a matrix and there the transpose will no longer be required because of the way Prof Ng defines the matrices.