How do you derive the update function for w(i+1) := wIi) - a*dwi ?

How do we get from J(wi)/J’(wi) → (ai -y)*xi

I’m having trouble using Newton’s’ method.

How do you derive the update function for w(i+1) := wIi) - a*dwi ?

How do we get from J(wi)/J’(wi) → (ai -y)*xi

I’m having trouble using Newton’s’ method.

1 Like

We don’t use Newton’s method to find model weights in deep learning. Are you asking a machine learning specialization question?

No for the Deep Learning Course

1 Like

How is the weight update function derived if not using Newton’s method?

1 Like

got it, thank you

1 Like

Yes, the update formula is just based on the meaning of the gradient. We calculate the gradient which is the multidimensional derivative of the surface that points in the direction of the fastest *increase* in the cost. Of course what we want to do is *decrease* the cost, so if we move in the opposite direction of the gradient it gives the fastest decrease of the cost (that’s why we multiply the gradient by -1). But because the gradient is just tangent to the surface, we don’t want to “jump” too far in that direction because it’s literally pointing off the surface. So we use the learning rate \alpha to modulate how far we move in that direction.

We repeat the above recipe until we get convergence near a point of minimal cost. Of course that will likely be a local minimum of the surface, but that’s a more subtle issue to be discussed later.

1 Like

Thank you paulinpauloalto for the detailed explanation