While calculating Z = WX + b, my code works if I use np.dot and does not if I simply write
Z = W * X (let’s omit b for simplicity).
Error says - operands could not be broadcast together with shapes (1,3) (3,2).
How to clear this confusion between different ways of products?
Hi @Mansi_Jain1,
When it comes to matrix multiplication in NumPy, the choice between np.dot
and element-wise multiplication depends on the dimensions and the operation you intend to perform.
In your case, you’re dealing with matrices of dimensions (1,3) and (3,2). For matrix multiplication, the inner dimensions (the second dimension of the first matrix and the first dimension of the second matrix) must match, as they do here (3 and 3). Therefore, np.dot
is the correct choice to perform matrix multiplication.
Element-wise multiplication (using *
or np.multiply
) is used when the matrices are of the same shape or are broadcastable to the same shape. This type of operation multiplies corresponding elements in the matrices.
You mentioned WX, which typically represents matrix multiplication in neural networks and other mathematical applications. This operation is indeed a case for np.dot
.
It’s worth noting that you’ll often need to make a choice between these two types of multiplication in scenarios like backpropagation in neural networks, where understanding the dimensions and the specific operation required is crucial.
Remember, np.dot
is for matrix multiplication (where dimensions must align appropriately), and element-wise multiplication is for operations involving matrices of the same size or broadcastable sizes.
Here’s another thread which says a lot of what lukmanaj just said in different words. It also makes one additional point about the notational conventions that Prof Ng uses, which is definitely worth a look.
Hi @lukmanaj
So, if I have matrices like (1,3) and (1,3) - then np.multiply will work?
Yes it will work. Same size.
Yes, as lukmanaj says, but you can also use transpose on either operand and then apply the dot product. Here’s a thread which explores that w.r.t. to computing the cost function.
Please make sure that you also followed the link in that post that I gave you to understand why it matters which operand you transpose. 1 x 3 dot 3 x 1 is totally different than 3 x 1 dot 1 x 3.
Yes, sir.
So, now I understand - in np.dot - I can either get a scalar(1 X 3 . 3 X 1) or a matrix(3 X 1, 1 X 3), given the kind of notation I use and this behaves like matrix multiplication. However, in np.multiply(), I essentially get the matrix of same size as output, as if I give ((m,n) and (1,n) - then python will broadcast it), it is different from matrix multiplication as there is no addition involved.
Am I missing anything in this?
Yes, that sounds right. The point of elementwise is that the two operands need to be exactly the same shape, but it works as long as you can broadcast one of the operands to match the other one.
But the critical high level point is that the two types of multiplication are fundamentally different mathematical operations. Everything we are doing here is translating mathematical formulas first into linear algebra operations and then into numpy/python code. So it is essential that you start by understanding the meaning of the mathematical formulas and how the linear algebra operations work.