Difference on the calculation of dw between week2 and week3

paulinpaloalto · October 9, 2024, 10:34pm

For the logistic regression case we have:

dw = \displaystyle \frac {1}{m} X \cdot dZ^T

and

dZ = A - Y

so that is equivalent to:

dw = \displaystyle \frac {1}{m} X \cdot (A - Y)^T

For the neural network case with 2 layers, for the output layer we have:

dW^{[2]} = \displaystyle \frac {1}{m} dZ^{[2]} \cdot A^{[1]T}

But we also have:

dZ^{[2]} = A^{[2]} - Y

So that is equivalent to:

dW^{[2]} = \displaystyle \frac {1}{m} (A^{[2]} - Y) \cdot A^{[1]T}

Ok, now remember that the equivalent of A^{[1]} for the LR case when we only have layer 1 would be A^{[0]} = X. So it boils down to the only difference being the fact that the NN case is the transpose of the LR case. Remember that we have this mathematical relationship in general:

(A \cdot B)^T = B^T \cdot A^T

Here are the dimensions of w and W^{[2]}:

w is n_x x 1
W^{[2]} is n^{[2]} x n^{[1]}

If you compute the dimensions on the two formulas above, you’ll see that it all works out. So the formulas are equivalent given that the orientation of the two objects is different. The rows of W^{[2]} are the equivalent of the transpose of the column vector w as shown in this thread. This is an arbitrary choice Prof Ng has made in how he defines the weights.

Topic		Replies	Views
The dimensions of dW Neural Networks and Deep Learning week-module-3 , coursera-platform	4	36	February 6, 2025
W2_A2_Ex-5_dw dimension Neural Networks and Deep Learning coursera-platform	7	513	April 8, 2023
Week 2 Programming Assignment: Logistic Regression with a Neural Network Mindset, Exercise 5 - propagate Neural Networks and Deep Learning coursera-platform	1	286	December 5, 2023
Help me understand this; Neural Networks and Deep Learning coursera-platform	1	598	July 29, 2021
Is W[1] the transposed version on W Neural Networks and Deep Learning coursera-platform	1	505	June 2, 2022

Difference on the calculation of dw between week2 and week3

Related topics