General implementation of forward propagation - shape of W

Hi Machine Learning Specialization community,
It’s not clear to me why W is intentionally stack as a (2,3) shape and not (3, 2). I thought it was due to matrix multiplication where the column of the first matrix must be the same as the row of the second matrix. However, w is pulled out from W as an 1-D vector for in the dense function so it seems like the (2,3) shape for W is not necessary(?) Not sure if I’m thinking about this correctly. Any thoughts or insights would be appreciated!

Which assignment or lecture are you looking at? Please be specific.

Hi TMosh,

Thank you for the quick response. The specific lecture I am referring to the “General implementation of forward propagation” lecture (Path to lecture: Advanced Learning Algorithms > Week 1 > General implementation of forward propagation) Please let me know if I can further clarify.

What is the time mark within that lecture video?

Assuming it is around time mark 5:40.

There are a few thing going on here.

First, in this example, Andrew has assigned the size of W as:
rows: the number of inputs to the layer
columns: the number of outputs from the layer

So, in this example, for W1 there are two input units, and three hidden layer units, so the size is (2 x 3).

Note that this convention for the format of a weight matrix is not universal or consistent. You can just as easily reverse the two. When it comes to matrices, transpositions are your friend.

Now, given the implementation of “dense()” uses a for-loop over the hidden layer units, and the using to compute a vector product (of w and a), you don’t strictly need W to be a matix. You could instead have three separate ‘w’ vectors.

However, this gets confusing and isn’t very efficient or expandable to other sizes of NN.
It’s a more general solution if you have one W matrix for each layer.

Now given that W is a matrix, you can use to compute the product of W and a, and avoid the inefficient for-loop entirely.

1 Like

HI TMosh,

Thank you for the clarification. Appreciate the prompt feedback!

Hi Tom,

Further to your explanation in the above time mark 5:40, could you please help me to understand the following:

  1. why W stacked and read in column not row while previous slid shown w1_1, w1_2 and w1_3 in row vector? can W be stacked and read in row? eg shown in my 2nd screenshot in pink, thereby for loop would write w=W[i, :]

  1. I understand W is 2 by 3 matrix, in W.shape[1], what .shape[1] means here and why W.shape[1] is 3?

Many thanks

That matrix is formatted so that the features are in the rows, and the examples are in the columns.

It’s backward from nearly every other assigment in the course.

Thank you for the quick reply as always.

Sorry I don’t fully understand your response. Would you please address to my question 1 by 1 so that I can better understand.

My question is related to the video General implementation of forward propagation in Numpy (I’m not referring to any assignment here)

Thank you

I think I did answer your question, because the assignment has the same inconsistency as the lecture.