Difference on the calculation of dw between week2 and week3

nalisaoucha · October 9, 2024, 7:56pm

Hi everyone, I have a question about the formula for dw. In Week 2, in the video “02_04_Vectorizing Logistic Regression’s Gradient Output,” we saw that dw = x * dz (fig 1). However, in Week 3, in the video “01_10_Backpropagation Intuition (Optional),” it is stated that dw = dz * x (fig2 and 3)
Are both formulas correct? If not, is it a mistake? If they are correct, could you explain why and what the reasoning is behind the change?

.

paulinpaloalto · October 9, 2024, 10:34pm

For the logistic regression case we have:

dw = \displaystyle \frac {1}{m} X \cdot dZ^T

and

dZ = A - Y

so that is equivalent to:

dw = \displaystyle \frac {1}{m} X \cdot (A - Y)^T

For the neural network case with 2 layers, for the output layer we have:

dW^{[2]} = \displaystyle \frac {1}{m} dZ^{[2]} \cdot A^{[1]T}

But we also have:

dZ^{[2]} = A^{[2]} - Y

So that is equivalent to:

dW^{[2]} = \displaystyle \frac {1}{m} (A^{[2]} - Y) \cdot A^{[1]T}

Ok, now remember that the equivalent of A^{[1]} for the LR case when we only have layer 1 would be A^{[0]} = X. So it boils down to the only difference being the fact that the NN case is the transpose of the LR case. Remember that we have this mathematical relationship in general:

(A \cdot B)^T = B^T \cdot A^T

Here are the dimensions of w and W^{[2]}:

w is n_x x 1
W^{[2]} is n^{[2]} x n^{[1]}

If you compute the dimensions on the two formulas above, you’ll see that it all works out. So the formulas are equivalent given that the orientation of the two objects is different. The rows of W^{[2]} are the equivalent of the transpose of the column vector w as shown in this thread. This is an arbitrary choice Prof Ng has made in how he defines the weights.

Topic		Replies	Views
The dimensions of dW Neural Networks and Deep Learning week-module-3 , coursera-platform	4	44	February 6, 2025
Help me understand this; Neural Networks and Deep Learning coursera-platform	1	604	July 29, 2021
W2_A2_Ex-5_dw dimension Neural Networks and Deep Learning coursera-platform	7	518	April 8, 2023
Week 2 Logistic Regression Gradient Descent: Why is dw_1 = xdz in logistic regression? Neural Networks and Deep Learning coursera-platform	1	690	June 18, 2021
Is W[1] the transposed version on W Neural Networks and Deep Learning coursera-platform	1	508	June 2, 2022

Difference on the calculation of dw between week2 and week3

Related topics