Cost function in Week 2 Exercise 5

My issue is regarding the propagate function, more specifically computing the cost. To vectorize it I initially used:
cost=-1(np.sum(np.dot(Y,np.log(A))+np.dot(1-Y,np.log(1-A))))/m*
But this gave me the following error:
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
After looking around the discourse, I found that the log terms had to be transposed, that is:
cost=-1(np.sum(np.dot(Y,np.log(A).T)+np.dot(1-Y,np.log(1-A).T)))/m*
I looked at more posts but I still do not understand why the log terms need to be transposed. I am aware of the matrix multiplication rule that states the number of columns of the first matrix must be equal to the number of columns of the second matrix, is that the reason for transposing or am I missing something else? Thank you.

2 Likes

Exactly, you need to transpose it otherwise you cannot multiply those terms.

1 Like

Does this apply to all matrix operations where on transposing the matrix, dot product becomes possible?

Here’s how dot is defined in Numpy docs:

Numpy.dot( a , b , out=None )

Dot product of two arrays. Specifically,

  • If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
  • If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.
  • If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred.
  • If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b .
  • If a is an N-D array and b is an M-D array (where M>=2 ), it is a sum product over the last axis of a and the second-to-last axis of b .

Since we’re not using “rank 1 arrays” per Andrew Ng’s recommendation, the dot in our case is a matrix-matrix multiplication, so the dimensions of the two matrices need to match. Since the first matrix is a row vector, the second one needs to be a column vector, so you need to transpose the second one. Note that if the first one was a column vector and the second one was a row vector, the dot function would give us their outer product.

3 Likes

Thank you, this was very helpful.

taking the transpose of np.log(A) is not necessary as Y and A are already the same shape (1 x 3). DId taking the transpose somehow work?

The transpose is not necessary if you do an elementwise multiply, but it does not work to do a dot product between two 1 x 3 vectors, right? Try it and watch what happens.

1 Like