Week 2 How to decide to use "*" or "np.dot( )" when calculating formula

This really confuses me, everytime I’m implementing the formula in the week2 assignment, I have to try both ways and see whether an error message comes up in order to validate my guess, but this way doesn’t help me too much in understanding well the difference between those two operations.

Thanks for the help in advance!

Propagation function
FAQ: Frequently Asked Questions for all DLS Courses
Tutorials form deep learning course
Confusion regarding different multiplications
Np.matmul vs np.dot
Sometimes still confusion of different multiplication types
C1-W2-A2: Forward and Backward propagation - Wrong value for cost
W4_A1_Ex-6_cost function with python programming
W2_use of np.dot
Week 2, Exercise 5, computing cost using np.dot
W2 E5 - dot product vs element-wise multiplication
C1 W3 Assignment re layer_sizes(X,Y)
C1_W3_Assignment - bugs and fixes
C1_W2_Programming_assignment (logistic regression with a neural network mindset)
W3_E6: dZ1 equation: Is '*' a Multiplication Or Convolution?
Course 4 Week 1 Assignment 1 Excercise 3 LSTM_Cell_forward
Vectorization in Numpy
How to choose between matrix multiplication and element wise multiplication during BackPropagation in Chain Rule?
Programming assignment 1 completed - but still a problem w/ np.dot() in propagate
DL Specialisation_C1_W4
The choice between using * (element-wise multiplication) and np.dot (dot product)
W3_Ex-6_AssertionError_Wrong values for dW1
Coursera (Sequence Models): Building_a_Recurrent_Neural_Network_Step_by_Step
When to use np.dot
Course 1 Week 4 Assignment 2 Error with Global Variables
DLS C1 W2 Exercice 5
Tensors and Vectors
Week 2 Exercise 5 - conceptual question
W2_A2_Value error: shapes (2,1) & (2,3) not aligned
Help with multiplying eigen vectors to obtain X_reduced
Curious question for dropout
Exercise 7 - linear_backward
Value Error in Week 3 Exercise 4 (forward propoagation)
W2 A2 |Possible inaccuracy while doing L R gradient descent implementation

I’ll try to answer that at a couple of different levels, since I’m not sure which level you’re asking about.

High level first: the point is that what we are doing here is converting mathematical formulas into Linear Algebra operations and then converting those into python/numpy code. So the first step is that you have to understand what the math is telling you to do. Elementwise multiply (* or np.multiply) and dot product style matrix multiplication (np.dot) are completely different operations. I hope you are not asking what the difference is there: you need to be familiar with basic Linear Algebra as a prerequisite here. If you don’t know how dot product matrix multiply works, you should put this course on hold and have a look at some of the excellent Linear Algebra courses out there on the web. E.g. the Khan Academy one is a great place to start.

Medium level: One really helpful thing is to realize the notational conventions that Prof Ng uses. If he means “elementwise” multiply, he will always explicitly use “*” as the operator in the mathematical expression. When he means “dot product” style, he just writes the matrices or vectors adjacent to each other with no explicit operator. So in this expression:

Z = w^TX + b

The operation between w^T and X is “real” matrix multiply (dot product style). If it were up to me, I like using the equivalent of the LaTeX “cdot” operator like this, which makes it a bit clearer (IMHO):

Z = w^T \cdot X + b

But Prof Ng didn’t ask my opinion and he’s the boss, so we just have to understand his notation. :nerd_face:

Low level: The other clue is just to look at the dimensions of the objects. For “dot product” multiply, the “inner” dimensions need to be the same. Dotting m x k with k x n gives you a result that is m x n. For elementwise multiplication (or any other elementwise operation like +, - or /), the two objects need to have exactly the same shape or be “broadcastable” to the same shape. What numpy means by “broadcasting” is that if one of the operands is a vector that matches either the row or column dimension of the other operand, it will duplicate the vector to create another matrix the same shape as the first one. Here’s an example:

np.random.seed(42)
A = np.random.rand(3,4)
print("A = " + str(A))
b = np.ones((3,1)) * 10
print("b.shape = " + str(b.shape))
print("b = " + str(b))
C = A * b
print("C = " + str(C))

Running that gives this:

A = [[0.37454012 0.95071431 0.73199394 0.59865848]
 [0.15601864 0.15599452 0.05808361 0.86617615]
 [0.60111501 0.70807258 0.02058449 0.96990985]]
b.shape = (3, 1)
b = [[10.]
 [10.]
 [10.]]
C = [[3.74540119 9.50714306 7.31993942 5.98658484]
 [1.5601864  1.5599452  0.58083612 8.66176146]
 [6.01115012 7.08072578 0.20584494 9.69909852]]

So that is an example of “broadcasting” in action: numpy expanded b to be a 3 x 4 matrix before the elementwise multiply operation.

28 Likes

Thank you so much. Your explanation totally solved my confusion!

2 Likes