Difference on the calculation of dw between week2 and week3

For the logistic regression case we have:

dw = \displaystyle \frac {1}{m} X \cdot dZ^T

and

dZ = A - Y

so that is equivalent to:

dw = \displaystyle \frac {1}{m} X \cdot (A - Y)^T

For the neural network case with 2 layers, for the output layer we have:

dW^{[2]} = \displaystyle \frac {1}{m} dZ^{[2]} \cdot A^{[1]T}

But we also have:

dZ^{[2]} = A^{[2]} - Y

So that is equivalent to:

dW^{[2]} = \displaystyle \frac {1}{m} (A^{[2]} - Y) \cdot A^{[1]T}

Ok, now remember that the equivalent of A^{[1]} for the LR case when we only have layer 1 would be A^{[0]} = X. So it boils down to the only difference being the fact that the NN case is the transpose of the LR case. Remember that we have this mathematical relationship in general:

(A \cdot B)^T = B^T \cdot A^T

Here are the dimensions of w and W^{[2]}:

w is n_x x 1
W^{[2]} is n^{[2]} x n^{[1]}

If you compute the dimensions on the two formulas above, you’ll see that it all works out. So the formulas are equivalent given that the orientation of the two objects is different. The rows of W^{[2]} are the equivalent of the transpose of the column vector w as shown in this thread. This is an arbitrary choice Prof Ng has made in how he defines the weights.

1 Like