With multiple examples of inputs, is the below right?

W.shape == (number of nodes in the hidden layer, m, nx)

m - number of examples
nx - input size

In logistic regression, the shape of W was like (m, nx) if I’m right. Then, with multiple nodes in the hidden layer, the shape of W should be (number of nodes in the hidden layer, m, nx).

However, It is hard to understand if I see the slide of explanation.

The shape of W^{[l]} is (number of nodes in the l-th layer, number of nodes in the (l-1)-th layer). The shape of W does not get m involved because it should be able to process any number of samples.

x^{(1)} is expressed as a column matrix, and its shape is (number of features, 1), where 1 means one example.

For the following matrix multiplication to work,

The shape of W^{[1]} must be (something, number of features).

The size of that “something” means the number of rows in W^{[1]}, and if you look at the following again:

I hope the lecture has mentioned that each row means one node in the 1-st layer. In other words, that “something” means “number of nodes in the 1-st layer”, so putting this back, we have the shape of of W^{[1]} to be (number of nodes in the 1-st layer, number of features in the input).

Generally, the shape of W^{[l]} is (number of nodes in the l-th layer, number of nodes in the (l-1)-th layer).

The reason why number of features in the input is represented as number of nodes in the (I-1)-th layer is that I-th layer uses the outcome of previous nodes as input.

Therefore, technically, we can say that the number of features of input in I-th layer is the number of nodes of previous layer which is (I-1)-th layer.

Is it right?

By the way, your explanation gave me clear understand about the dimension of W.

Deeply appreciate for the kind and detail explanation.

then, can we say that the dimension of W is not related the number of example which is represented as ‘m’?

I remember that ‘m’ doesn’t matter to W because the shape of W was (nx, 1) or (1, nx) for W.T in logistic regression.

From this, I suppose that W is not related to ‘m’, and the dimension of W is (1, nx) in logistic regression and (number of nodes of current layer, number of nodes of previous layer) in multiple layers.

I said Yes to both sentences even though you were describing W as having different shapes. When the course first introduces logistic regression, we do use (nx, 1), but then once we jump to a neural network setting, we switch to (1, nx). The difference here is just by the choice of the lecture. Most of DLS is about neural network, and so (1, nx), or (#nodes in current layer, #nodes in last layer) is the rule to remember

Cheers,
Raymond

PS: Just a side story. There are always many “background processes” running in my brain. When I was transiting earlier, one of those processes was somehow triggered to wonder whether you got my explanation, then my brain switched it to the foreground and I started to think what my next step should be if you still had questions. You know, we can discuss a problem from many perspectives, and one of my next perspectives that came up in my mind at that time was exactly how you had described it - a layer’s output is its next layer’s input. When I saw your replies, I was so glad that we have thought about that in the same way Cheers!

Thank you so much for the kind and detail answer of my several questions.

Now, my understand of the dimension of W got cleared after read your explanation.

Also, I want to thank for your reply. It does not only give me a clear understand, but also encourage me to study with a joy of getting knowledge of what I didn’t understand before.

I will keep studying this enjoyable topics and asking a lot of questions if it’s okay.

It is certainly okay that you ask questions. Just please be sure to ask them in new threads so that each thread has one focus because it will make future learners easier to follow through any of them in case they find the questions interesting too.