Gradient Descent for Neural Networks - Shallow Neural Networks | Coursera

Suhail_Akhtar · October 17, 2023, 11:10am

In this lecture how the dimensions of W_1, b_1, W_2 and b_2 were computed. For example dimension of W_1 and W_2 is (n_1, n_0) ,(n_2, n_1) respectively. I would really appreciate it if someone teach me this.

saifkhanengr · October 17, 2023, 12:54pm

I didn’t watch the video but let me share some words with you. First, it is totally on you how to shape the parameters. In DLS courses, the shape of W is designed as (number of neurons of the current layer, number of neurons of the previous layer [or input features in case of W1]). And made all the latter equations compatible with this convention. However, in MLS, they take the different approach: W shape is: (number of neurons of the previous layer [or input features in case of W1], number of neurons of the current layer).

Suhail_Akhtar · October 17, 2023, 1:22pm

Exactly that’s what I thought might be the case, but for current layer shape we are taking the help of next layer not previous. I don’t remember the timestamp but it’s after the derivatives of activation function.

Kind Regards
Suhail Akhtar

saifkhanengr · October 17, 2023, 2:20pm

Maybe I need to revise that. But I am sure we use the previous layer’s neurons, not the next layer. Could you please provide a link to read/watch or a screenshot?

paulinpaloalto · October 17, 2023, 4:20pm

The shapes of the weights and biases were explained in the lectures earlier than the one about Gradient Descent where Prof Ng explains how forward propagation works. Here’s a thread which talks about this point in more detail.

Suhail_Akhtar · October 18, 2023, 6:15am

I went to thread provided by you. It wasn’t very much useful. First thing that I don’t understand is dot product of N_{[ x ]} x 1 and 1x N_{[ x ]}. It’ll remain same even after changing the dimensions of W. Like you said.

It has dimension n_x x 1, where n_x is the number of input features (elements in each input x vector). x is also a column vector n_x x 1 , so in order to get that dot product to work, we need to transpose the w vector.

From what I’ve understood there is no need for transposing.
And secondly coming to original question I am attaching the screenshot.

How come did we calculated the shape of W and b for two layers?

saifkhanengr · October 18, 2023, 11:25am

I already answered this question in my first response. Also a thread Paul shared with you answered this:

Topic		Replies	Views
Question about gradient descent for neural network Neural Networks and Deep Learning	5	552	December 12, 2022
Week 3, lesson 5 Neural Networks and Deep Learning	5	530	June 29, 2022
Gradient Descent on m Examples - Neural Networks Basics \| Coursera Neural Networks and Deep Learning	7	589	January 18, 2022
[Course 1 Week 3] Quiz question Neural Networks and Deep Learning	3	642	June 14, 2022
Gradient Descent - Neural Networks Basics \| Coursera Convolutional Neural Networks week-2	1	219	February 11, 2024

Gradient Descent for Neural Networks - Shallow Neural Networks | Coursera

Related topics