Programming assignment 1 completed - but still a problem w/ np.dot() in propagate

So I was able to successfully complete this assignment which I found quite interesting.

However I still am having a problem with the propagate(w, b, X, Y) function.

I’ve been able to get my cost function working correctly if I just straight out use the cost formula, J, with multiplication (‘*’)–

But the instructions specify we should perform the multiplication with np.dot() for the sake of vectorization-- Yet it is not working for me when I do this.

The error I am getting is:

ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)

Perhaps I have to do some sort of .reshape() ? Though I am really not sure why. I mean even the error says the shapes of the terms are the same (?). Nor is there any transpose happening in the cost formula.

I’m not sure what I’m missing.

The cost formula is just a math formula. Now you need to do two steps: translate that into vector operations and then express those vector operations in numpy code. There are basically two approaches:

  1. Elementwise multiply, followed by summing the products.
  2. You can do it in one step with a dot product.

But if you choose option 2) then you need to understand how dot products work. The “inner” dimensions need to agree. So if both your vectors are 1 x 3, then one of them must be transposed. But even there, there are two ways to do it and only one of them does what you want in this case. Here’s a thread which shows a concrete example of what I mean by that.

Dear Paulin,

Thanks for explaining-- I now have both methods working.

I had method 1 originally working (without for loops, but as described-- does that still count as vectorization ?).

With your advice I now have the dot product method working as well.

Question though:

I’m still trying to work my way through learning the linear algebra material from sources outside the course. However, from those math oriented texts they seem to suggest that the result of a dot product is always supposed to be a scalar. Yet, as you mention with Numpy if you transpose the wrong vector, instead you’ll end up with another matrix-- So:

  1. Why does Numpy allow the second result (i.e. end product is another matrix) rather than just throwing an error ?

  2. So how then do you know which is the ‘right’ vector to transpose (without just guessing and then checking the result) ? At first glimpse it seems if you are doing np.dot(A, B), it would be always transpose B – However, in another case where the dot product is used in the case of applying the linear formula wX + b, instead we are transposing the first term (i.e. np.dot(w.T, X)) – Thus how do you know which is the ‘right way’ ?

Best,
-Anthony

Hi @Nevermnd ,

  1. Here is the reference menu on np.dot(), it explains the input arguments and output.
  2. The transpose of a matrix is needed in order to satisfy the rule for matrix multiplication, where the number of columns in the first matrix must be equal to the number of rows in the second matrix.
1 Like

Well this is just a terminology question. In math when they say “dot product”, they only mean an operation between two vectors. And in math the vectors don’t have “orientation”: they are not row vector or column vectors. They are just “vectors” with a given number of elements. If they have the same number of elements, then v \cdot w will always be a scalar.

Then in numpy they get a little sloppy and np.dot is really implementing full matrix multiply, not just dot products, but they call it “dot”. But the key mathematical point there is that the “atomic” operation of matrix multiply is a dot product between one row of the first operand and one column of the second operand, right? So dot product is what is going on there, but if the operands are matrices instead of vectors then there are lots of individual dot products being computed which form the scalar elements of the output matrix or vector.

You have to understand what the math formula means that you are trying to implement and you have to understand what the operations like transpose and dot product mean. Here’s a thread that addresses a slightly different question (how do I know whether to use elementwise multiply or dot product multiply) that is worth a look just for the conceptual point I’m making here.