# Course 1: Week 3 (backpropagation intuition)

Not sure if you guys have figured out how dz[1] is calculated but here is the calculation which might help someone who comes here.

So the goal is to minimize loss with respect to z1 which is dL/dz1 and this can be written as `dL/da2` * `da2/dz2` * `dz2/da1` * `da1/dz1` using chain rule.

Remember that this term `dL/da2` * `da2/dz2` is loss with respect to dz2 which is `dL/dz2` = `a2-y`. You can refer this wonderful post to know how this is derived if you are not sure.

Now our equation is `(a2-y)` * `dz2/da1` * `da1/dz1`

`dz2/da1` = `d/da1 w2a1+b` because `z2` is derived from w2a1+b
derivative of `w2a1+b` with respect to `a1` is `w2`

`da1/dz1` = `d/dz1 sigmoid(z1)`
derivative of `sigmoid(z1)` is `sigmoid(z1) * (1-sigmoid(z1))`

Finally everything put together,

`dL/da2` * `da2/dz2` * `dz2/da1` * `da1/dz1` becomes `(a2-y)` * `w2` * `sigmoid(z1) * (1-sigmoid(z1))` which Prof. Andrew has given as `w2` * `a2-y` (which is loss with respect to z2 so named it as dz2) and the final term `sigmoid(z1) * (1-sigmoid(z1))` is denoted as g prime (z1).

Hope this helps as I couldn’t use math notation but just plain text.

P.S: Please note that `da1/dz1` can change depending on the activation function used. Here I have assumed activation function at hidden layer is sigmoid and in one of the assignments tanh is used. So a portion of `dz1` changes.

17 Likes