Course 1 Week 3 Backpropagation Intuition (Optional)

Ryanpang · August 17, 2021, 11:26am

In course 1 week 3 Backpropagation Intuition, Prof. Ng wrote the derivation for da^[1] and dz^[1]. I tried to derive it myself but my result is off by a transpose. Can anyone please help, thank you so much.

paulinpaloalto · August 17, 2021, 2:14pm

Notice that the dimensional analysis on your formula does not work:

dz^{[2]} is n^{[2]} x m, where m is the number of samples.

W^{[2]} is n^{[2]} x n^{[1]}

The result needs to be n^{[1]} x m, right? Whereas if you work out the dimensional analysis on Prof Ng’s version as you show it, it gives the expected results. A couple of other things worth noting:

The following is a mathematical identity:

(A \cdot B)^T = B^T \cdot A^T

Prof Ng uses the convention that the gradient of an object is the same shape as the underlying object. If you really do the full pure math version of this, that is not quite the way it works out.

chonkymonkey · December 18, 2021, 4:11pm

I have the same issue with this matter and the reply provided doesn’t seem to answer why the dimensions don’t work out although we are using the chain rule correctly.

Could you please explain ?

paulinpaloalto · December 18, 2021, 4:44pm

Applying the Chain Rule may be straightforward, but things are a bit different when you are in multiple dimensions and working with real matrix multiplies. If you think about how dot product works, it’s a bit more complicated than just an “elementwise” operation, right?

If you want to understand more about all this, it goes back to the math, which is beyond the scope of this course. Prof Ng has designed this course so that you don’t even need to know univariate calculus in order to use the material. But that means you just have to take his word for it when he gives you a formula.

If you have the math background and want to dig deeper, here is a thread that links to information on the web that covers all this. That thread is linked from the FAQ Thread which may also be worth a look just on general principles.

chonkymonkey · December 18, 2021, 5:46pm

Hello Paulin,

I believe the issue stems from the assumption that z and dz (dL/dz) have the same dimensions, whereas the derivative of a scalar with respect to a column vector is a row vector.

If we take that to be true, then the dimensions match as is shown in the picture below :

Of course “dz” and “da” no longer have the same dimensions as z and a if we think of it this way, but at least it works out.

paulinpaloalto · December 18, 2021, 6:14pm

Yes, you have pointed out an inconsistency in the way Prof Ng has chosen to formulate things. It is actually the case that the gradient of an object is not the same shape as the underlying object: it is the transpose of it. But since Prof Ng is not showing the derivations, he can keep things simpler by “papering over” that bit of the “pure math”. It doesn’t really matter if you’re just taking the formulas as he writes them and turning them into numpy code.

Topic		Replies	Views
Course 1: Week 3 (backpropagation intuition) Neural Networks and Deep Learning	21	5180	April 27, 2022
Course 1: Week 3 (Backpropagation derivative equations clarification) Neural Networks and Deep Learning	1	595	June 17, 2021
WK3 Backpropagation intuition formula demonstration Neural Networks and Deep Learning	4	556	June 27, 2022
Week 3 - Please explain how we got to this backward propagation result? Neural Networks and Deep Learning	6	721	February 12, 2023
Week 3: computing derivatives for shallow network Neural Networks and Deep Learning	2	681	January 26, 2022

Course 1 Week 3 Backpropagation Intuition (Optional)

Related topics