Why we are using backward propagation in the computing derivatives?
DL course 1 in week 2
I think it’s the other way around: we compute the derivatives of the cost w.r.t. the parameters, because we need those gradients in order to implement backward propagation. The gradients are partial derivatives. Backward propagation is how the learning takes place, which makes all this possible.
And if I may add to @paulinpaloalto’s explanation, at the heart of things, backward propagation is essentially the Chain Rule of Calculus for computing derivatives. The “backward propagation” term obfuscates that basic point. It stands in opposition to the “forward propagation” terminology, which is necessary to compute the values to evaluate the chain rule derivative, if you will.
Thank you a lot
So the different between forward and backward is the path of the derivatives (left to right or right to left ) ? and when we choose forward or backaward - propagation?
We don’t “choose forward or backward”, right? The two directions have fundamentally different purposes and we need both. Forward propagation is the operation of the model: it takes inputs and then uses its parameters to make a prediction based on that input (e.g. “there is a cat in this picture” or not). There are no derivatives involved in forward propagation: it is just applying functions in layers one after the other. Then the purpose of backward propagation is to train the model to give a better prediction. It does that by using the derivatives of the cost to “point” the parameter values in the direction of a better (lower cost) solution.