Week 2 Logistic Regression Vectorization

Hello everyone, i wanted to ask about something in the calculation of the dw matrix


in the highlighted section, how does the “X” matrix consist of columns of examples and rows of “features” and the “dz^T” is also a column vector of dzs and we still get a matrix of size n x 1? the drawing shows it as a row vector of size 1 x n. Can someone please clarify the operation of calculating the dw? thanks in advance.

The formula is:

dw = \frac {1}{m}X \cdot dZ^T

Note that I did one small “interpolation” there: Professor Ng uses the notational convention that a dot product matrix multiply is just shown as the two operands adjacent with no explicit operator. I’ve added the \cdot operator there to indicate the np.dot operation.

So now look at the dimensions of the objects:

X is n_x x m, where n_x is the number of features in each input vector and m is the total number of samples.

dZ is 1 x m, since the output for each sample is a scalar value.

So that means dZ^T has dimension m x 1.

The dot product of n_x x m with m x 1 yields the output value of shape n_x x 1.

There are two different matrix multiplication operators: dot product style and “elementwise” multiply (also called the “Hadamard product”). When Professor Ng writes elementwise multiplication, it is always indicated by the * operator. As mentioned above, if there is no explicit operator, then his notation means it is a “real” dot product style multiplication.

Thank you so much! I got it now.

1 Like