Formal explanation of change of order in chain rule

There are several layers to the answer here. The top layer answer is that Prof Ng has specifically designed this course not to require knowledge even of univariate calculus, let along multivariate or matrix calculus. So the good news is you don’t need to know calculus. The bad news is that means you have to just take his word for the formulas.

The next layer down is that if you have the math background, Prof Ng has slightly simplified things here. In the fully mathematical treatment of all this the “gradients” would be the transpose of what Prof Ng shows. With Prof Ng’s simplification, we have the rule that the shape of the gradient is the same as the shape of the underlying object. In math, it would be the transpose of that. Of course we also have this mathematical identity:

(A \cdot B)^T = B^T \cdot A^T

Given that he’s not really showing the full derivations, the simplification just makes things work more smoothly in terms of how we apply the gradients. Here’s a thread with links to the derivations and general info about matrix calculus.

1 Like