Dot product vs element-wise multiplication of arrays

Laine_Wishart · September 29, 2022, 4:38am

Hi,

In one of the lectures (Vectorizing Logistic Regression’s Gradient Output), we compute Z = w^T * X + b. Andrew then says that we vectorise this in code using np.dot(w.T, X) + b.

My question is, why do we take the dot product of w^T and X rather than performing element-wise multiplication (i.e (w.T * X) + b?

jenitta.jebaraj · September 29, 2022, 4:52am

Hello @Laine_Wishart
Go through the below post. That answers your query.

Laine_Wishart · September 29, 2022, 5:04am

Hi Jenitta, thank you for providing that resource. This helped me understand that matrix multiplication and the dot product are two different operations, and when one would apply a given operation according to Prof Ng’s notation. However, I’m still confused as to why we use the dot product instead of element-wise matrix multiplication. Are you able to provide any clarification on this?

TMosh · September 29, 2022, 5:05am

Efficiency:
The dot product automatically computes the sum of the products of the elements.
If you use an element-wise product, then you have to also compute the sum separately.

Laine_Wishart · September 29, 2022, 5:09am

Thank you. But why do we need to compute the sum in this equation specifically? There is no sum in the equation Z = w^T * X . My understanding is we only want to multiple w^T_i and x_i, for each value of i, in an efficient manner. Why not just do element-wise multiplication?

paulinpaloalto · September 29, 2022, 5:18am

The point is that elementwise multiplication is not the same operation. Dot product involves a multiplication followed by a sum as one unified operation. In the particular instance of w^T \cdot X, you could get the same effect by doing w * X followed by summing the columns of the result. But that only works because w is a column vector in the case of logistic regression. In the next week we will graduate to real Neural Networks and there the weights are matrices, not vectors, so that approach no longer works. Fundamentally this all goes back to the mathematical expressions that we are trying to compute: you have to understand what those mean first and then translate those into vector operations and then translate those into numpy/python code. But the math always comes first and determines what we need to do.

TMosh · September 29, 2022, 5:28am

Yes, there is, in the general case where w and x are both vectors.

Laine_Wishart · September 29, 2022, 5:38am

@paulinpaloalto Thank you, this helps me understand better. I was confused why we weren’t using w*X since this also seemed to work, but understand now that we could also apply this operation in this specific case.

@TMosh Could you please provide more explanation of your point here? It’s not immediately obvious to me how w and x both being vectors would result in there being a sum in the equation X = w^T * X.

TMosh · September 29, 2022, 6:02am

The equation you’re quoting is actually an implementation of how to compute the f_wb value.
If w and x are vectors, it means your data set has more than one feature.

See the Week 2 lecture on multiple features, specifically around time mark 8:05:

paulinpaloalto · September 29, 2022, 2:41pm

It sounds like you are saying that you don’t understand what a “dot product” is. Basic knowledge of linear algebra is a pre-requisite for this course. If you don’t understand how normal matrix multiply works (dot product style), you should go take one of the good online Linear Algebra courses to learn that. Here’s a thread that discusses this and gives some links.

Let’s forget python for a moment and just talk about the underlying math. If I have two vectors v and w with n elements, then this is what the dot product of v and w means:

v \cdot w = \displaystyle \sum_{i = 1}^n v_i * w_i

So you can see that it involves first the “elementwise product” of each of the corresponding elements of the two vectors, followed by the sum of those products. That’s what Tom and I mean about dot product involving a sum. It’s both: a product and a sum. That’s why it is more efficient, especially if you have a CPU with vector instructions (as essentially any modern CPU does).

Of course that is the simplest case of two vectors. Then when you do full matrix multiply, each output position in the resulting matrix is the dot product of one row of the first operand with one column of the second operand. So if A is n x k and B is k x m, then the product A \cdot B has dimensions n x m.

In the particular case of the linear activation calculation for logistic regression:

Z = w^T \cdot X + b

The dot product there is actually a matrix multiply with the first operand being a 1 x n_x vector (because of the transpose) and X having dimensions n_x x m. So the result has dimensions 1 x m.

Topic		Replies	Views
Week 2 How to decide to use "*" or "np.dot( )" when calculating formula Neural Networks and Deep Learning	3	2089	January 12, 2022
W2 E5 - dot product vs element-wise multiplication Neural Networks and Deep Learning	4	672	May 26, 2023
The choice between using * (element-wise multiplication) and np.dot (dot product) Deep Learning Resources	1	283	January 2, 2024
Course1, Week2, programming Exercise 5 - Propagate Neural Networks and Deep Learning	5	641	May 16, 2023
DL Specialisation_C1_W4 Neural Networks and Deep Learning	9	407	December 27, 2023

Dot product vs element-wise multiplication of arrays

Related topics