That is beyond the scope of this course. Here’s a thread which has some links that are relevant.

What you are probably asking about is the fact that because he doesn’t want to cover the derivation and all this matrix calculus material, Prof Ng takes a convenient shortcut that helps simplify translating Gradient Descent into code: he uses the convention that the gradient of a vector or matrix has the same shape and orientation as the base object. That makes writing the “update parameters” logic simple, but (as I think you are pointing out) that’s not really how things work if you really do the full mathematical version of all this. The “pure math” expression of all this is that the gradient of the object ends up being transposed from the shape of the base object.