Gradient Descent for Neural Networks - Shallow Neural Networks | Coursera

In this lecture how the dimensions of W_1, b_1, W_2 and b_2 were computed. For example dimension of W_1 and W_2 is (n_1, n_0) ,(n_2, n_1) respectively. I would really appreciate it if someone teach me this.

I didn’t watch the video but let me share some words with you. First, it is totally on you how to shape the parameters. In DLS courses, the shape of W is designed as (number of neurons of the current layer, number of neurons of the previous layer [or input features in case of W1]). And made all the latter equations compatible with this convention. However, in MLS, they take the different approach: W shape is: (number of neurons of the previous layer [or input features in case of W1], number of neurons of the current layer).

Exactly that’s what I thought might be the case, but for current layer shape we are taking the help of next layer not previous. I don’t remember the timestamp but it’s after the derivatives of activation function.

Kind Regards
Suhail Akhtar

Maybe I need to revise that. But I am sure we use the previous layer’s neurons, not the next layer. Could you please provide a link to read/watch or a screenshot?

The shapes of the weights and biases were explained in the lectures earlier than the one about Gradient Descent where Prof Ng explains how forward propagation works. Here’s a thread which talks about this point in more detail.

I went to thread provided by you. It wasn’t very much useful. First thing that I don’t understand is dot product of N [ x ] x 1 and 1x N [ x ]. It’ll remain same even after changing the dimensions of W. Like you said.

It has dimension nx x 1, where nx is the number of input features (elements in each input x vector). x is also a column vector nx x 1 , so in order to get that dot product to work, we need to transpose the w vector.

From what I’ve understood there is no need for transposing.
And secondly coming to original question I am attaching the screenshot.


How come did we calculated the shape of W and b for two layers?

I already answered this question in my first response. Also a thread Paul shared with you answered this: