Shouldn't we transpose w when taking dot product with a_in to respect matrix multiplication rules

Hi, the code for my_dense function goes as:
def my_dense(a_in, W, b):
** units = W.shape[1]**
** a_out = np.zeros(units)**
** for j in range(units): **
** w = W[:,j] **
** z = np.dot(w, a_in) + b[j] **
** a_out[j] = g(z) **
** return(a_out)**

Matrix dot multiplication requires the number columns in the first matrix to be the same as number of rows in the 2nd. Shouldn’t the w in the np.dot(w, a_in) be transposed for this dot product to be valid? Or is this obfuscated in the code somehow?

Please help!

Hi @sishita ,
Regarding this, which course is this from?
In any case, transposing may not be required if the matrix is arranged in a way that it fits the rules.

@lukmanaj yes, that is what I was thinking. It depends on the shape you stack the weights (horizontal or vertical ?)-- and not every teacher, book does it the same.

In the courses here we don’t have to because shapes are properly opposing.

1 Like

This is from optional lab #2 from week 1 in course #2 “Advanced Learning Algorithms”. The shape of w is m,n & the shape of a_in is a numpy array of size m. Extracting col j from w yields a column vector with m rows, which per my understanding should be transposed before taking dot product with a_in of size m. Any explanation would be much appreciated!

@lukmanaj I don’t have access to this lab-- maybe you do?

In that case, I have adjusted the details using the pencil icon to reflect the course and the week number.

Also, to properly explain the code, it was arranged such that you only multiply vectors.

  • W is a weight matrix with shape (input_units, output_units)
  • W[:, j] selects the j-th column of W, which is a vector with shape (input_units,)
  • a_in is an input vector with shape (input_units,)
    So when we do np.dot(w, a_in) , we actually compute the sum of the element-wise multiplications of the vectors. The result is a scalar (a single value), which is the intended behavior for each unit in the output. This is then assigned to a_out[j] = g(z). In the end, we compute for all the units and get our complete a_out, which was initiated as an array of zeros.
    In this case, we do not need to transpose because we avoided doing matrix multiplication.
    The code may not be optimized but it does the job.
    Hope this clears out your concern.
1 Like

Nope. I don’t have access too. But I saved up my notes from when I took the course.

Thank you so much! Didn’t realize np.dot is element wise multiplication. I was mistaking it to be a matrix multiplication. Really appreciate it!

You’re welcome! I’m glad the explanation helped clarify things.

To add a bit more context:

  • np.dot can indeed perform different operations depending on the input dimensions:
    • For 1D arrays (vectors): It computes the dot product (i.e., the sum of element-wise multiplications), resulting in a scalar.
    • For 2D arrays (matrices): It performs standard matrix multiplication.
    • For higher-dimensional arrays: It generalizes to tensor dot products.

So, you were partially correct in thinking it could be matrix multiplication — it just depends on the context of the input arrays! Thanks for the nuanced response, and feel free to ask if you have more questions. :blush:

Ah I see! For the 2nd & 3rd bullets where it is doing matrix multiplication, is there a difference between using np.dot vs np.matmul?

For the 2nd and third bullet, we may need to transpose if the inner dimensions don’t match (number of columns of the first matrix and number of rows of second matrix). But if they match, no need to transpose.
np.matmul is for matrix multiplication while np.dot is more general purpose and behaves according to the nature of the arrays that are passed to it, as explained. For matrix multiplication, no much difference that I know of.

Makes sense, thank you for the explanation! :slight_smile:

1 Like