Hi everyone,
In, for example, a simple linear regression model, we can simply define the equation of the derivative of the cost function with 2 parameters (w, b) and then, each time we need to calculate it, it is O(1) calculation.
However, in a neural network, we need backprop to calculate the derivative in O(N + P), because due to various types of activation functions, the equations are not always polynomial (for example the ReLU), so it is infeasible to precalculate the derivative equation.
Do I understand correctly?
Thank you.
Hi @francesco4203,
Agree with you that saving time is the reason behind BackProp. As for whether it is infeasible to find the derivative equation, we have the chain rule and we have the gradient formulae for every layer, then what if we substitute those formulae into the chain rule? The final equation may look complicated, but we can do that, can’t we?
Raymond
Thank you for your explanation.
But I still have a question, if then, why do we not just find the final derivative equation and then save the cost of computing backprop over and over again?
Do we use the BackProp for finding the value of the derivative at each point, or we just use it for finding the derivative equation (which means we just run BackProp once)?
Thank you.
You said O(N+P). Why can backprop save time? ( a reference here )