How is W shape (3,1) and not (1,3)?

In this second lab of the Week1 it is specified that W shape is (3,1) as i know W shape relies on the number of parameters, in the second layer, a shape is (3,1) so W shape has to be (1,3) … please what do you think ?

Here are the sizes:

Why do you feel W2 should be (1,3)?

1 Like

That is because TensorFlow computes the weights in a row order fashion.
The weights for Layer2 are arranged as

W_2 = \begin{bmatrix} w_1^{[2]} \\ w_2^{[2]} \\ w_3^{[2]} \end{bmatrix}

So this results in W2.shape = (3,1)
For more info check the : Coffee Roasting with numpy implementation notebook

1 Like

Hello TMosh,
Thanks for your answer, so as you highlighted, W shape relies on number of features, well the number of feature from the activation layer 1 is 1 and the number of units is 1 so why it s not (1,1) ? I still dont get it … how come there are three features ?

Hello Indra Neel,
Thank you for your answer, the way you put it together its more obvious for W to be of shape (3,1) i guess because the activation resulting from layer 1 has three rows. But as W is dependent to number of features, and as activation from layer 1 has only one feature, how come then W is three ? I hope my question is clear … thanks for advance

The shape of W must match the number of features in X.

1 Like

I understand the W shape which is (2,3) in the first layer, but i cant get why its (3,1) for the second layer. Can you please help me ?

The shape of W2 is based on the number of outputs and the number of hidden layer units.
This is because it is what connects the hidden layer to the output layer.

1 Like

Hello Mr TMosh,

I apologise for my late reply, based on what is given in the course, W shape is based on the number of features ( which is in this case 1, please correct me if I am wrong, and the number of units, which is 1 so we re good on that.

So W being (3,1) I get why it is 1(one unit), but I don’t get the 3(are there really 3 features ? )

The number of features depends on the dataset. There are multiple data sets used. They have different sizes.