Backpropogation

Consider a neural network with 2 or more layers. After we update the weights in layer 1, the input to layer 2 (a(1)) has changed so ∂z/∂w is no longer correct as z has changed to z* and z* != aw + b. We are able to only take the partial derivative of the loss with respect to w for the first layer, as when we take the partial derivative of the loss with respect to subsequent layers we must hold all variables constant, and an update to earlier layer/layers prevents this.

Use the chain rule.

I fully understand the mechanics of the chain rule. I don’t think you follow the argument in my post.

You are correct.

Hello @Y_L1,

In principle, we can calculate the partial derivative with respect to each and every weights before getting any of them updated, so to this end, there will not be the problem you mentioned.

In practice, as the name “back propagation” suggests, we compute the derivatives from the last layer back to the first, and we update the weights from the last layer backwards. Using your example, that will be to update layer 2 first before layer 1. Moreover, the calculations of any derivative are based on results cached during the forward phase, which means we will only use “z” and never use any “z*”. The key here is we need to, during the forward phase, cache all the results required to calculate the derivatives in the backward phase, and Andrew has shown that in the lectures.

Cheers,
Raymond

1 Like