In logistic regression, the professor was writing dw = 1/m(Xdz.T). In week 3, he writes dzA.T. I’m confused, I thought we should have maintained the order of operands since we are dealing with matrices.

This is related to how we infer the derivative of W2.

Notice that in logistic regression,dw=(1/m)XdZT,we use T to fit the dimension of w and Z.

WHY A[1] has a T behind?Because in the video,W is a **column vector**,but W[2] is a **row vector**,so we need to tranform A[1].

In this way, we adjust them to the right dimension.

