Trying to understand the chain rule for this lecture video C2Week3 Classification with a Neural Network - Minimizing log-loss

Here in timestamp 2.28: the instructor is trying to find out partial derivative dL/dw11
I understand up till partial derivative dy/dz (purple color)section (y depends on z).

i dont understand why is dz/da1 singled out when finding the dependencies of z (purple) instead of z being dependent on both a1w1 and a2w2?

Hello @klai001,

You can consider the chain rule like “how to connect the path from L to w_{11}”, and here it is:

So one step is to go from z to a_1, therefore, \frac{\partial{z}}{\partial{a_1}} is needed to complete the chain rule.

I don’t actually mentor this course so I don’t know how the lecture teaches it, but let’s see if you can find this way of thinking compatible with your current way of understanding \frac{\partial{\hat{y}}}{\partial{z}}?

After that, see if you can answer your own question:

Try to write your answer and we can take a look together :wink: