Course 1 Week 3 Backpropagation Intuition (Optional)

In course 1 week 3 Backpropagation Intuition, Prof. Ng wrote the derivation for da^[1] and dz^[1]. I tried to derive it myself but my result is off by a transpose. Can anyone please help, thank you so much.

1 Like

Notice that the dimensional analysis on your formula does not work:

dz^{[2]} is n^{[2]} x m, where m is the number of samples.

W^{[2]} is n^{[2]} x n^{[1]}

The result needs to be n^{[1]} x m, right? Whereas if you work out the dimensional analysis on Prof Ng’s version as you show it, it gives the expected results. A couple of other things worth noting:

The following is a mathematical identity:

(A \cdot B)^T = B^T \cdot A^T

Prof Ng uses the convention that the gradient of an object is the same shape as the underlying object. If you really do the full pure math version of this, that is not quite the way it works out.

I have the same issue with this matter and the reply provided doesn’t seem to answer why the dimensions don’t work out although we are using the chain rule correctly.

Could you please explain ?

Applying the Chain Rule may be straightforward, but things are a bit different when you are in multiple dimensions and working with real matrix multiplies. If you think about how dot product works, it’s a bit more complicated than just an “elementwise” operation, right?

If you want to understand more about all this, it goes back to the math, which is beyond the scope of this course. Prof Ng has designed this course so that you don’t even need to know univariate calculus in order to use the material. But that means you just have to take his word for it when he gives you a formula.

If you have the math background and want to dig deeper, here is a thread that links to information on the web that covers all this. That thread is linked from the FAQ Thread which may also be worth a look just on general principles.

Hello Paulin,

I believe the issue stems from the assumption that z and dz (dL/dz) have the same dimensions, whereas the derivative of a scalar with respect to a column vector is a row vector.

If we take that to be true, then the dimensions match as is shown in the picture below :

Of course “dz” and “da” no longer have the same dimensions as z and a if we think of it this way, but at least it works out.

Yes, you have pointed out an inconsistency in the way Prof Ng has chosen to formulate things. It is actually the case that the gradient of an object is not the same shape as the underlying object: it is the transpose of it. But since Prof Ng is not showing the derivations, he can keep things simpler by “papering over” that bit of the “pure math”. It doesn’t really matter if you’re just taking the formulas as he writes them and turning them into numpy code.