Week 2 How to decide to use "*" or "np.dot( )" when calculating formula

I’ll try to answer that at a couple of different levels, since I’m not sure which level you’re asking about.

High level first: the point is that what we are doing here is converting mathematical formulas into Linear Algebra operations and then converting those into python/numpy code. So the first step is that you have to understand what the math is telling you to do. Elementwise multiply (* or np.multiply) and dot product style matrix multiplication (np.dot) are completely different operations. I hope you are not asking what the difference is there: you need to be familiar with basic Linear Algebra as a prerequisite here. If you don’t know how dot product matrix multiply works, you should put this course on hold and have a look at some of the excellent Linear Algebra courses out there on the web. E.g. the Khan Academy one is a great place to start.

Medium level: One really helpful thing is to realize the notational conventions that Prof Ng uses. If he means “elementwise” multiply, he will always explicitly use “*” as the operator in the mathematical expression. When he means “dot product” style, he just writes the matrices or vectors adjacent to each other with no explicit operator. So in this expression:

Z = w^TX + b

The operation between w^T and X is “real” matrix multiply (dot product style). If it were up to me, I like using the equivalent of the LaTeX “cdot” operator like this, which makes it a bit clearer (IMHO):

Z = w^T \cdot X + b

But Prof Ng didn’t ask my opinion and he’s the boss, so we just have to understand his notation. :nerd_face:

Low level: The other clue is just to look at the dimensions of the objects. For “dot product” multiply, the “inner” dimensions need to be the same. Dotting m x k with k x n gives you a result that is m x n. For elementwise multiplication (or any other elementwise operation like +, - or /), the two objects need to have exactly the same shape or be “broadcastable” to the same shape. What numpy means by “broadcasting” is that if one of the operands is a vector that matches either the row or column dimension of the other operand, it will duplicate the vector to create another matrix the same shape as the first one. Here’s an example:

A = np.random.rand(3,4)
print("A = " + str(A))
b = np.ones((3,1)) * 10
print("b.shape = " + str(b.shape))
print("b = " + str(b))
C = A * b
print("C = " + str(C))

Running that gives this:

A = [[0.37454012 0.95071431 0.73199394 0.59865848]
 [0.15601864 0.15599452 0.05808361 0.86617615]
 [0.60111501 0.70807258 0.02058449 0.96990985]]
b.shape = (3, 1)
b = [[10.]
C = [[3.74540119 9.50714306 7.31993942 5.98658484]
 [1.5601864  1.5599452  0.58083612 8.66176146]
 [6.01115012 7.08072578 0.20584494 9.69909852]]

So that is an example of “broadcasting” in action: numpy expanded b to be a 3 x 4 matrix before the elementwise multiply operation.