Hi everyone, I have a question about the formula for dw. In Week 2, in the video “02_04_Vectorizing Logistic Regression’s Gradient Output,” we saw that dw = x * dz (fig 1). However, in Week 3, in the video “01_10_Backpropagation Intuition (Optional),” it is stated that dw = dz * x (fig2 and 3)

Are both formulas correct? If not, is it a mistake? If they are correct, could you explain why and what the reasoning is behind the change?

.

For the logistic regression case we have:

dw = \displaystyle \frac {1}{m} X \cdot dZ^T

and

dZ = A - Y

so that is equivalent to:

dw = \displaystyle \frac {1}{m} X \cdot (A - Y)^T

For the neural network case with 2 layers, for the output layer we have:

dW^{[2]} = \displaystyle \frac {1}{m} dZ^{[2]} \cdot A^{[1]T}

But we also have:

dZ^{[2]} = A^{[2]} - Y

So that is equivalent to:

dW^{[2]} = \displaystyle \frac {1}{m} (A^{[2]} - Y) \cdot A^{[1]T}

Ok, now remember that the equivalent of A^{[1]} for the LR case when we only have layer 1 would be A^{[0]} = X. So it boils down to the only difference being the fact that the NN case is the transpose of the LR case. Remember that we have this mathematical relationship in general:

(A \cdot B)^T = B^T \cdot A^T

Here are the dimensions of w and W^{[2]}:

w is n_x x 1

W^{[2]} is n^{[2]} x n^{[1]}

If you compute the dimensions on the two formulas above, you’ll see that it all works out. So the formulas are equivalent given that the orientation of the two objects is different. The rows of W^{[2]} are the equivalent of the transpose of the column vector w as shown in this thread. This is an arbitrary choice Prof Ng has made in how he defines the weights.

1 Like