Hello everyone,
I recently started out Neural Networks and Deep Learning and I have a question about the specific ecuation:
z = w.T * X+b
In the first exercise we flattened X into a vector with shape (features, examples) and initialized w as a matrix of zeros with dimensions (X.shape[0] (Number of features), 1).
Now my question is, why do we apply w.T, weren’t both w and X already compatible for multiplication since both had the same n of rows?
I hope my message gets trough as english is not my 1st language.
Have a great day!
Hello @Bruno_Catano_Arellan,
You said,
- X: (features, examples)
- w: (Number of features, 1)
Right? In this case, we need w.T as we can only do \mathbf{A}_{m \times n} \times \mathbf{B}_{n \times k}. Note how the two matrices share the same n.
Cheers,
Raymond
1 Like
It is just a convention that Prof Ng chooses to use that all standalone vectors are column vectors, so he formats w as n_x x 1. That requires the transpose in order for the dot product to work with X which is n_x x m, where m is the number of samples.
Note that when we get to Week 3 and full Neural Networks, the weights will become matrices and will be oriented such that the transpose will no longer be required.
Also note that you use * in your formula, but according to Prof Ng’s convention that means “elementwise” multiply. He writes it this way when he means “dot product” with no explicit operator:
Z = w^T X + b
Although I think it’s clearer to use the LaTeX “cdot” operator:
Z = w^T \cdot X + b
1 Like