Derivative of Loss function wrt weights (w) - Logistic Regression Vs Neural Net

Bildad · December 25, 2025, 6:15pm

In the course_1 week1&2, programming Assignment_1 of the deeplearning specialization, dw = 1/m * (Xdz.T). However, in the programming Assignment_2 in week_3 defined dw = 1/m * (dzX.T). Questions 1. In the lecture videos, dw = 1/m * xdz without a transpose, but in the assignments, we see a Transpose of either the derivative (dz) or training examples (X). 2. Why does this ‘transpose’ apply to the training example (X) in week_3, unlike the derivative (dz) in week _2?

TMosh · December 25, 2025, 6:34pm

I’ll move this thread out of the “AI Discussions” forum, and place it in the forum area for that course (tagging “Coursera” platform, but it would also apply to the DLAI platform).

paulinpaloalto · December 25, 2025, 11:06pm

Note that there are no programming assignments in Week 1 and there are two assignments in Week 2 and only 1 in Week 3. In Assignment 1 in Week 2, there is nothing about Logistic Regression, right? That’s just the Intro to Numpy assignment, which includes sigmoid, but no gradients and no mention of Logistic Regression. There is no assignment 2 in Week 3, only the Planar Data assignment.

Ok, I think I’ve found the places you are asking about. Here’s a slide from the Week 2 lecture “Vectorizing Logistic Regression’s Gradient Output”:

And here’s the corresponding slide from text of the Planar Data Assignment in Week 3:

Let’s look at the dimensions in both cases. In the Logistic Regression case, the weights are a column vector w with dimensions n_x x 1, where n_x is the number of features in each input vector. dZ is a 1 x m row vector, where m is the number of samples.

In both cases X is the input sample matrix with size n_x x m.

In the NN case (Week 3), the weights are a matrix W^{[1]} with shape n^{[1]} x n_x, where n^{[1]} is the number of output neurons in layer 1. dZ^{[1]} is a matrix of shape n^{[1]} x m.

So now let’s do the dimensional analysis on the formulas. In the LR case, we have:

dw = \displaystyle \frac {1}{m} X \cdot dZ^T

So we are dotting n_x by m with m x 1 which will give us a result that is n_x x 1, which agrees with the dimension of dw (the same as the dimension of w).

In the NN case, we have:

dW^{[1]} = \displaystyle \frac {1}{m} dZ^{[1]} \cdot X^T

So we are dotting n^{[1]} x m with m x n_x, which gives a result that is n^{[1]} x n_x, which agrees with the dimensions of dW^{[1]} (the same as the dimensions of W^{[1]}), right?

So why are the two cases different? Because the weights for a neural network have been arranged such that the coefficients for each output neuron are a row of the matrix, instead of a column as in the case of the w weight vector for LR.

Here’s a thread which explains why Professor Ng chose that orientation for the weight matrices. It was covered in detail in the lectures, although perhaps he didn’t actually point out the motivation. Basically the NN weight matrices are oriented as the transpose of the weight vector in LR.

It’s worth pointing out that there is this mathematical identity for transposes and matrix multiplication:

(A \cdot B)^T = B^T \cdot A^T

Applying that in our case gives:

(X \cdot dZ^T)^T = dZ \cdot X^T

Deepti_Prasad · December 25, 2025, 11:32pm

@Bildad

are you taking DLS course in Coursera platform or deeplearning ai learning platform?

Bildad · December 26, 2025, 11:48pm

Hi, I am taking the DLS in the Deeplearning ai platform

Bildad · December 26, 2025, 11:51pm

Thank you. This explains it

Deepti_Prasad · December 27, 2025, 12:16am

I have updated your topic categories to deeplearning.ai

Topic		Replies	Views
Difference on the calculation of dw between week2 and week3 Neural Networks and Deep Learning week-module-2 , week-module-3 , coursera-platform	1	22	October 9, 2024
The dimensions of dW Neural Networks and Deep Learning week-module-3 , coursera-platform	4	47	February 6, 2025
Help me understand this; Neural Networks and Deep Learning coursera-platform	1	605	July 29, 2021
Transpose of the weight matrix Neural Networks and Deep Learning coursera-platform	6	1144	August 9, 2021
W2_A2_Ex-5_dw dimension Neural Networks and Deep Learning coursera-platform	7	518	April 8, 2023

Derivative of Loss function wrt weights (w) - Logistic Regression Vs Neural Net

Related topics