hello ,why does Backpropgation has many different algorithms ,i’ve seen the old course(youtube course) ,and it backpropgate delta ,and read another book it calculate derivative but in another way ,some books work in one sample at a time ,another as this course with all data ,so why is the different ?what is the difference between all those algorithms?

Most derivations of backpropatation that you see online are for an NN with a linear output - because it uses the linear regression cost equation, and those derivatives are very simple calculus.

If you have multiple logical outputs (such as for identifying handwritten digits), then the output includes sigmoid() or softmax(), and the calculus for that cost function is quite different than for a linear output. And, the equations for the partial derivatives (the gradients) are also quite different.

not for linear ,i’ve seen the old course and the algorithm of backpropagation is totally different for all causes ,it calculate delta at last layer and backpropagate it using some derivatives

Derivatives are always required, that’s because the gradients are the partial derivative of the cost equation. It’s no different in this course.

Can you give a specific example where the methods are different?

this course :

last course :

That is the “squared error” cost function.
The original “Machine Learning” course did not use that cost function for neural networks.

1 Like