In the course_1 week1&2, programming Assignment_1 of the deeplearning specialization, dw = 1/m * (Xdz.T). However, in the programming Assignment_2 in week_3 defined dw = 1/m * (dzX.T). Questions 1. In the lecture videos, dw = 1/m * xdz without a transpose, but in the assignments, we see a Transpose of either the derivative (dz) or training examples (X). 2. Why does this ‘transpose’ apply to the training example (X) in week_3, unlike the derivative (dz) in week _2?
I’ll move this thread out of the “AI Discussions” forum, and place it in the forum area for that course (tagging “Coursera” platform, but it would also apply to the DLAI platform).
Note that there are no programming assignments in Week 1 and there are two assignments in Week 2 and only 1 in Week 3. In Assignment 1 in Week 2, there is nothing about Logistic Regression, right? That’s just the Intro to Numpy assignment, which includes sigmoid, but no gradients and no mention of Logistic Regression. There is no assignment 2 in Week 3, only the Planar Data assignment.
Ok, I think I’ve found the places you are asking about. Here’s a slide from the Week 2 lecture “Vectorizing Logistic Regression’s Gradient Output”:
And here’s the corresponding slide from text of the Planar Data Assignment in Week 3:
Let’s look at the dimensions in both cases. In the Logistic Regression case, the weights are a column vector w with dimensions n_x x 1, where n_x is the number of features in each input vector. dZ is a 1 x m row vector, where m is the number of samples.
In both cases X is the input sample matrix with size n_x x m.
In the NN case (Week 3), the weights are a matrix W^{[1]} with shape n^{[1]} x n_x, where n^{[1]} is the number of output neurons in layer 1. dZ^{[1]} is a matrix of shape n^{[1]} x m.
So now let’s do the dimensional analysis on the formulas. In the LR case, we have:
dw = \displaystyle \frac {1}{m} X \cdot dZ^T
So we are dotting n_x by m with m x 1 which will give us a result that is n_x x 1, which agrees with the dimension of dw (the same as the dimension of w).
In the NN case, we have:
dW^{[1]} = \displaystyle \frac {1}{m} dZ^{[1]} \cdot X^T
So we are dotting n^{[1]} x m with m x n_x, which gives a result that is n^{[1]} x n_x, which agrees with the dimensions of dW^{[1]} (the same as the dimensions of W^{[1]}), right?
So why are the two cases different? Because the weights for a neural network have been arranged such that the coefficients for each output neuron are a row of the matrix, instead of a column as in the case of the w weight vector for LR.
Here’s a thread which explains why Professor Ng chose that orientation for the weight matrices. It was covered in detail in the lectures, although perhaps he didn’t actually point out the motivation. Basically the NN weight matrices are oriented as the transpose of the weight vector in LR.
It’s worth pointing out that there is this mathematical identity for transposes and matrix multiplication:
(A \cdot B)^T = B^T \cdot A^T
Applying that in our case gives:
(X \cdot dZ^T)^T = dZ \cdot X^T
are you taking DLS course in Coursera platform or deeplearning ai learning platform?
Hi, I am taking the DLS in the Deeplearning ai platform
Thank you. This explains it
I have updated your topic categories to deeplearning.ai

