In one of the lectures (Vectorizing Logistic Regression’s Gradient Output), we compute Z = w^T * X + b. Andrew then says that we vectorise this in code using np.dot(w.T, X) + b.

My question is, why do we take the dot product of w^T and X rather than performing element-wise multiplication (i.e (w.T * X) + b?

Hi Jenitta, thank you for providing that resource. This helped me understand that matrix multiplication and the dot product are two different operations, and when one would apply a given operation according to Prof Ng’s notation. However, I’m still confused as to why we use the dot product instead of element-wise matrix multiplication. Are you able to provide any clarification on this?

Efficiency:
The dot product automatically computes the sum of the products of the elements.
If you use an element-wise product, then you have to also compute the sum separately.

Thank you. But why do we need to compute the sum in this equation specifically? There is no sum in the equation Z = w^T * X . My understanding is we only want to multiple w^T_i and x_i, for each value of i, in an efficient manner. Why not just do element-wise multiplication?

The point is that elementwise multiplication is not the same operation. Dot product involves a multiplication followed by a sum as one unified operation. In the particular instance of w^T \cdot X, you could get the same effect by doing w * X followed by summing the columns of the result. But that only works because w is a column vector in the case of logistic regression. In the next week we will graduate to real Neural Networks and there the weights are matrices, not vectors, so that approach no longer works. Fundamentally this all goes back to the mathematical expressions that we are trying to compute: you have to understand what those mean first and then translate those into vector operations and then translate those into numpy/python code. But the math always comes first and determines what we need to do.

@paulinpaloalto Thank you, this helps me understand better. I was confused why we weren’t using w*X since this also seemed to work, but understand now that we could also apply this operation in this specific case.

@TMosh Could you please provide more explanation of your point here? It’s not immediately obvious to me how w and x both being vectors would result in there being a sum in the equation X = w^T * X.

The equation you’re quoting is actually an implementation of how to compute the f_wb value.
If w and x are vectors, it means your data set has more than one feature.

See the Week 2 lecture on multiple features, specifically around time mark 8:05:

It sounds like you are saying that you don’t understand what a “dot product” is. Basic knowledge of linear algebra is a pre-requisite for this course. If you don’t understand how normal matrix multiply works (dot product style), you should go take one of the good online Linear Algebra courses to learn that. Here’s a thread that discusses this and gives some links.

Let’s forget python for a moment and just talk about the underlying math. If I have two vectors v and w with n elements, then this is what the dot product of v and w means:

v \cdot w = \displaystyle \sum_{i = 1}^n v_i * w_i

So you can see that it involves first the “elementwise product” of each of the corresponding elements of the two vectors, followed by the sum of those products. That’s what Tom and I mean about dot product involving a sum. It’s both: a product and a sum. That’s why it is more efficient, especially if you have a CPU with vector instructions (as essentially any modern CPU does).

Of course that is the simplest case of two vectors. Then when you do full matrix multiply, each output position in the resulting matrix is the dot product of one row of the first operand with one column of the second operand. So if A is n x k and B is k x m, then the product A \cdot B has dimensions n x m.

In the particular case of the linear activation calculation for logistic regression:

Z = w^T \cdot X + b

The dot product there is actually a matrix multiply with the first operand being a 1 x n_x vector (because of the transpose) and X having dimensions n_x x m. So the result has dimensions 1 x m.