W2_A2_Ex-5_dw dimension

Eitan_Koren · April 8, 2023, 1:50pm

Hi, I’m struggling to get the propagate function right.

To calculate dw we need to subtract A minus Y and multiply it by X.

Y is a row vector (1,3) and also A returns as a (1,3) row vector.
X shape is (2,3)

But the assert function defines dw as (1,2).

I’m confused.
a) How can I transpose between these dimensions?
b) based on the lecture: Vectorizing Logistic Regression’s Gradient Output, the length of dw should be m, that is the number of columns in X, in this case, 3.

rmwkwok · April 8, 2023, 2:25pm

Hi @Eitan_Koren

I am on the C1 W2 Assignment “Logistic_Regression_with_a_Neural_Network_mindset” exercise 5, and I find that the assert function requires dw to be (2, 1) instead.

The shape of dw should be the same as w, and the shape of w for a logistic regression problem should be ( number of features in X, 1). Since the number of features is 2, the shapes of both dw and w should be (2, 1).

If you still have questions about this, and can share with us the timestamp for the source of that statement, other mentors or I can take a look at it, since I will have to go very soon.

Cheers,
Raymond

Eitan_Koren · April 8, 2023, 2:44pm

thanks for the quick response.
I see now my mistake, it is (2,1).
The screenshot from the lecture was from this video, starting from minute 4:15

I still don’t understand how to get to (2,1) if A and Y are (1,3) and X is (2,3)
X has two features and 3 training examples, (if I understood correctly) and A and Y have weights and labels for 3 features. So how dw has only 2 features?

paulinpaloalto · April 8, 2023, 2:53pm

Because each column of X has two elements. That is the way Prof Ng chooses to arrange the data: the columns of X are the individual sample vectors. The high level point is that this is a choice: later when we get to Course 2, there will be cases in which he makes the rows of X the individual sample vectors. But here X is 2 x 3, which means we have 2 input features and 3 input samples. That is why Y and A are 1 x 3, because they give one answer for each sample.

Eitan_Koren · April 8, 2023, 3:23pm

hi @paulinpaloalto, I’m going to need a hint here. I have been trying to solve this for three hours and can’t seem to find a solution.
If I multiply anything by X, I still get a matrix 2 x 3.
Which operation can I do to make the result a 2 x 1 vector? do I need to reshape?

paulinpaloalto · April 8, 2023, 3:42pm

The output is not 2 x 1, right? It’s 1 x 3. The point is that we’re doing a dot product style matrix multiply between w and X, right? The notational convention that Prof Ng uses is that when he means “elementwise” multiply, he will always use the explicit operator “*”. But when he means dot product style multiply, he just writes the operands adjacent to each other with no explicit operator.

He also uses the convention that any standalone vectors are formatted as column vectors. So the weight vector w has 2 elements in our case, because there are 2 features. Making it a column vector means that w has dimensions 2 x 1.

If w is 2 x 1 and X is 2 x 3, then we need a “transpose” operation in order for the dot product dimensions to work. So the math formula is:

Z = w^T \cdot X + b

So w^T has dimensions 1 x 2 dotted with 2 x 3 gives you a 1 x 3 output, right? Then you apply sigmoid to get A and the activation function always operates elementwise meaning that the dimensions are preserved.

Eitan_Koren · April 8, 2023, 3:58pm

So far i follow, I calculate Z with w and b and it gets updated to:
Z: [[ 7. 1.5 -1.2]]

then I call the sigmoid function and A is updated to:
A: [[0.99908895 0.81757448 0.23147522]]

Y is sent as an argument:
[[1, 1, 0]]

I calculate dz (A-Y) and I get this vector
dz: [[-0.00091105 -0.18242552 0.23147522]]

And this is the part I cannot figure out.
I need to dot product dz (1 x 3 row vector) with X which is a 2 x 3 matrix.

How can this result in a 2 x 1 vector??
What am I doing wrong?

paulinpaloalto · April 8, 2023, 4:16pm

You’re right that the shape of dw is the same as the shape of w, so it needs to be 2 x 1 in this instance. So what is the formula for dw? It is

dw = \displaystyle \frac {1}{m} X dZ^T

as you showed above. But as I pointed out in my earlier reply, what Prof Ng means by that is this:

dw = \displaystyle \frac {1}{m} X \cdot dZ^T

So what is the “dimensional analysis” on that dot product? X is 2 x 3 and dZ is the same shape as Z, so it’s 1 x 3, right? So dZ^T will be 3 x 1. What happens if you dot 2 x 3 with 3 x 1? As if by magic, it turns out to be what you need, right?

Topic		Replies	Views
Week 2 Programming Assignment: Logistic Regression with a Neural Network Mindset, Exercise 5 - propagate Neural Networks and Deep Learning coursera-platform	1	286	December 5, 2023
The dimensions of dW Neural Networks and Deep Learning week-module-3 , coursera-platform	4	35	February 6, 2025
Error in vectorization of logistic regression Neural Networks and Deep Learning coursera-platform	3	564	July 24, 2021
Week 2 - Vectorizing Log Reg Grad Out - dw computation Neural Networks and Deep Learning coursera-platform	2	534	December 2, 2021
Course 1 - Week 2 - Exercise 5 - propagate Neural Networks and Deep Learning coursera-platform	4	730	April 15, 2022

W2_A2_Ex-5_dw dimension

Related topics