How do you derive the update function for w(i+1) := wIi) - a*dwi ?
How do we get from J(wi)/J’(wi) → (ai -y)*xi
I’m having trouble using Newton’s’ method.
How do you derive the update function for w(i+1) := wIi) - a*dwi ?
How do we get from J(wi)/J’(wi) → (ai -y)*xi
I’m having trouble using Newton’s’ method.
We don’t use Newton’s method to find model weights in deep learning. Are you asking a machine learning specialization question?
No for the Deep Learning Course
Please move your topic to the correct subcategory.
Here’s the community faq to get started.
How is the weight update function derived if not using Newton’s method?
Deep learning uses gradient descent for weight updates.
got it, thank you
Yes, the update formula is just based on the meaning of the gradient. We calculate the gradient which is the multidimensional derivative of the surface that points in the direction of the fastest increase in the cost. Of course what we want to do is decrease the cost, so if we move in the opposite direction of the gradient it gives the fastest decrease of the cost (that’s why we multiply the gradient by -1). But because the gradient is just tangent to the surface, we don’t want to “jump” too far in that direction because it’s literally pointing off the surface. So we use the learning rate \alpha to modulate how far we move in that direction.
We repeat the above recipe until we get convergence near a point of minimal cost. Of course that will likely be a local minimum of the surface, but that’s a more subtle issue to be discussed later.
Thank you paulinpauloalto for the detailed explanation