I am not getting dz[1] in this image/video snapshot

I also tried to find dz[1] myself but I am not getting half of parts. Can someone please help with this equation.

Hello @anatom

Which half of the part can you get?

Note that the chain rule actually does not work here, so if that is the part that has the problem, I am afraid the easiest way would be to remember the equation, or you have to derive it the hard way…

Here is my answer to another question on the same equation. That learner’s focus is on the ordering of the variables in the right hand side. Is that also your focus?


Ah ha yes I was thinking using the same chain rule, I will try to derive the other way out.
Thanks you & clear skies!

But what’s wrong with using chain rule here ?

Can you write the equation from where I can start of with finding derivative of dz[1]

Check this.

Haha, it just doesn’t work there! It will not give you the correct answer. I believe the post that I shared earlier has a counter-example that showed it doesn’t work.

I have a feeling that my answer is not going to satisfy you… :laughing: however, is it good to take something that is not even supposed to work and ask why doesn’t it work? :smirk::smirk::smirk::smirk:

I suggest you to start with a neural network of 2 layers, and each layer has 2 neurons. Write down all the equations NOT in matrix form but in scalar form. Then derive the gradient formula for each scalar weight (not matrix weight). Finally, put all the scalar weight gradients back to a matrix in the right places, and you will see how you should order W^{[2]} and dz^{[2]} so that the matrix equation will be consistent with your scalar results.

It is a good thing to work with scalar equations and scalar weights because you can finally use the chain rule.


Note that I have quoted in my previous linked post the following:

If you follow my suggestion, and when you put all the scalar weights back to a matrix, you will essentially be “collecting the derivative of each component of the dependent variable with respect to each component of the independent variable” just as what the wikipedia page is talking.