Formal explanation of change of order in chain rule

paulinpaloalto · April 23, 2023, 11:22pm

There are several layers to the answer here. The top layer answer is that Prof Ng has specifically designed this course not to require knowledge even of univariate calculus, let along multivariate or matrix calculus. So the good news is you don’t need to know calculus. The bad news is that means you have to just take his word for the formulas.

The next layer down is that if you have the math background, Prof Ng has slightly simplified things here. In the fully mathematical treatment of all this the “gradients” would be the transpose of what Prof Ng shows. With Prof Ng’s simplification, we have the rule that the shape of the gradient is the same as the shape of the underlying object. In math, it would be the transpose of that. Of course we also have this mathematical identity:

(A \cdot B)^T = B^T \cdot A^T

Given that he’s not really showing the full derivations, the simplification just makes things work more smoothly in terms of how we apply the gradients. Here’s a thread with links to the derivations and general info about matrix calculus.

Topic		Replies	Views
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] Neural Networks and Deep Learning	4	784	May 27, 2023
Derivative of Z1 Neural Networks and Deep Learning week-4	9	257	February 24, 2025
Backward propagation derivation Neural Networks and Deep Learning week-3	23	76	March 1, 2025
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning	5	608	February 9, 2023
Week 3 - Please explain how we got to this backward propagation result? Neural Networks and Deep Learning	6	721	February 12, 2023

Formal explanation of change of order in chain rule

Related topics