I’m wondering whether the results showed at 6:22 of the video “classification with neural network: minimizing log loss” are accurate:
Based on the neural network drawing, w_{21} is the weight of x_2 for the perceptron 1 (in red). So the chain rule for \frac{\partial L}{\partial w_{21}} should be:
\frac{\partial L}{\partial \hat y} \cdot \frac{\partial \hat y}{\partial z} \cdot\frac{\partial z}{\partial a_1} \cdot \frac{\partial a_1}{\partial z_1} \cdot\frac{\partial z_1}{\partial w_{21}}
The last term should be equal to x_2, not x_1 contrary to what’s written on the slide.
Am I missing sth?