Hi, I’m struggling to get the propagate function right.
To calculate dw we need to subtract A minus Y and multiply it by X.
Y is a row vector (1,3) and also A returns as a (1,3) row vector. X shape is (2,3)
But the assert function defines dw as (1,2).
I’m confused.
a) How can I transpose between these dimensions?
b) based on the lecture: Vectorizing Logistic Regression’s Gradient Output, the length of dw should be m, that is the number of columns in X, in this case, 3.
I am on the C1 W2 Assignment “Logistic_Regression_with_a_Neural_Network_mindset” exercise 5, and I find that the assert function requires dw to be (2, 1) instead.
The shape of dw should be the same as w, and the shape of w for a logistic regression problem should be ( number of features in X, 1). Since the number of features is 2, the shapes of both dw and w should be (2, 1).
If you still have questions about this, and can share with us the timestamp for the source of that statement, other mentors or I can take a look at it, since I will have to go very soon.
thanks for the quick response.
I see now my mistake, it is (2,1).
The screenshot from the lecture was from this video, starting from minute 4:15
I still don’t understand how to get to (2,1) if A and Y are (1,3) and X is (2,3)
X has two features and 3 training examples, (if I understood correctly) and A and Y have weights and labels for 3 features. So how dw has only 2 features?
Because each column of X has two elements. That is the way Prof Ng chooses to arrange the data: the columns of X are the individual sample vectors. The high level point is that this is a choice: later when we get to Course 2, there will be cases in which he makes the rows of X the individual sample vectors. But here X is 2 x 3, which means we have 2 input features and 3 input samples. That is why Y and A are 1 x 3, because they give one answer for each sample.
hi @paulinpaloalto, I’m going to need a hint here. I have been trying to solve this for three hours and can’t seem to find a solution.
If I multiply anything by X, I still get a matrix 2 x 3.
Which operation can I do to make the result a 2 x 1 vector? do I need to reshape?
The output is not 2 x 1, right? It’s 1 x 3. The point is that we’re doing a dot product style matrix multiply between w and X, right? The notational convention that Prof Ng uses is that when he means “elementwise” multiply, he will always use the explicit operator “*”. But when he means dot product style multiply, he just writes the operands adjacent to each other with no explicit operator.
He also uses the convention that any standalone vectors are formatted as column vectors. So the weight vector w has 2 elements in our case, because there are 2 features. Making it a column vector means that w has dimensions 2 x 1.
If w is 2 x 1 and X is 2 x 3, then we need a “transpose” operation in order for the dot product dimensions to work. So the math formula is:
Z = w^T \cdot X + b
So w^T has dimensions 1 x 2 dotted with 2 x 3 gives you a 1 x 3 output, right? Then you apply sigmoid to get A and the activation function always operates elementwise meaning that the dimensions are preserved.
You’re right that the shape of dw is the same as the shape of w, so it needs to be 2 x 1 in this instance. So what is the formula for dw? It is
dw = \displaystyle \frac {1}{m} X dZ^T
as you showed above. But as I pointed out in my earlier reply, what Prof Ng means by that is this:
dw = \displaystyle \frac {1}{m} X \cdot dZ^T
So what is the “dimensional analysis” on that dot product? X is 2 x 3 and dZ is the same shape as Z, so it’s 1 x 3, right? So dZ^T will be 3 x 1. What happens if you dot 2 x 3 with 3 x 1? As if by magic, it turns out to be what you need, right?