Dear sir, in the lectures sir said that np.dot(w,x)+b would give us the regression model but since these arrays are not compatible for multiplication(13 and 13) , how does it happen? Should it not be w*b as it would perform element wise multiplication? Does it has something to do with array broadcasting? Please help? Also in the previous course sir told us that we would get hypothesis function by the multiplication of transpose of parameter vector and variable vector (theta(transpose)*x) which was mathematically correct.
Please help me on this.
Hi, @Gopesh_Yadav!
The basic math behind neural networks (or specifically, regression models in this case) is the product of the input (or previous layer output) and the layer weights plus a bias term.
y_i = w_i \cdot x_i + b_i
\hat{\mathbf{y}} = \mathbf{w}^\top \mathbf{x} + \mathbf{b}
Hello @Gopesh_Yadav, thank you for the question!
Let’s focus on the maths and begin with some definitions:
\vec{w} = \begin{bmatrix} w_1 & w_2 \end{bmatrix} is a row vector of 2 weights
X = \begin{bmatrix} \vec{x}^{(1)} \\ \vec{x}^{(2)} \\ \vec{x}^{(3)} \end{bmatrix} is a matrix of 3 samples, where each sample is like, for example,
\vec{x}^{(1)} = \begin{bmatrix} x_1^{(1)} & x_2^{(1)} \end{bmatrix}, which is a row vector of 2 features.
When you dot 2 vectors, it is not matrix multiplication, so it is valid for us to write \vec{w} \cdot \vec{x}^{(1)} = w_1x_1^{(1)} + w_2x_2^{(1)}.
For multiplication between a matrix and a vector, it is matrix multiplication, and the vector will be identified as a matrix, and we need to match the shape as you said, so we need X \vec{w}^T. In this case, we get
X \vec{w}^T = \begin{bmatrix} \vec{x}^{(1)} \\ \vec{x}^{(2)} \\ \vec{x}^{(3)} \end{bmatrix} \vec{w}^T = \begin{bmatrix} \vec{x}^{(1)} \vec{w}^T \\ \vec{x}^{(2)} \vec{w}^T\\ \vec{x}^{(3)} \vec{w}^T\end{bmatrix} = \begin{bmatrix} w_1x_1^{(1)} + w_2x_2^{(1)} \\ w_1x_1^{(2)} + w_2x_2^{(2)} \\ w_1x_1^{(3)} + w_2x_2^{(3)} \end{bmatrix}
Here, for example, \vec{x}^{(1)} and \vec{w}^T are, respectively, identified as a row matrix and the transpose of another row matrix, so \vec{x}^{(1)} \vec{w}^T is a matrix multiplication of two matrices.
If you want to further ask question about a specific time of a lecture video, please include the name of the video and the timestamps.
Cheers!
Raymond
Thanks for replying Raymond,
Actually, that is what my doubt was - I do understand that in mathematical terms, w.x would give us dot product of the two-row vectors but in python np.dot(w,x) is used for matrix multiplication, isn’t it? I read in one of the quizzes that for element-wise multiplication of two matrices( or vectors, per se) we use ‘*’ operator?
Please help me with this.
I am referring to the vectorization video of the second week in course 1 of the specialization.
Thanks
Hello Gopesh,
Let me quote the doc for np.dot, it can distinguish a dot product (inner product) from a matrix multiplication by examining the shape of the input arrays.
- If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
- If both a and b are 2-D arrays, it is matrix multiplication, but using
matmul
ora @ b
is preferred.- If either a or b is 0-D (scalar), it is equivalent to
multiply
and usingnumpy.multiply(a, b)
ora * b
is preferred.- If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
- If a is an N-D array and b is an M-D array (where
M>=2
), …
Yes, and according to the doc, it says
Multiply arguments element-wise.
and
Equivalent to x1 * x2 in terms of array broadcasting.
Let me know if you still have doubts.
Raymond
Oh! thanks for your explaination.
cheers!
Gopesh, you’re welcome!