Is there any mathematic proven for this


Hi everyone,
I am the beginner with this field,
I am learning about learning rate and the professor tell that we dont need to decrease learning rate because when w decrease the derviate of J(w) also decrease
I can understand in visualize but is there any mathematic proven that proves w will always go to minimum by this algorithm?
Thanks for reading my question!

I suspect there is a proof. I’m not a mathematician though.

But by intuition, if the cost function is convex, then as you approach the minimum, the magnitude of the gradients approach zero. So reducing the learning rate is not necessary, because the change in the weight values (which is based on the gradients) also approaches zero.

Actually it’s a limit situation, because the progress toward the minimum on each iteration also gets smaller. Eventually you reach the point where the solution is ‘close enough’ that no additional improvement is needed.

1 Like

Thanks for your reply

Hi @duylm,

Here is a simple convergence proof when:
The function J(w) is convex and differentiable,
Its gradient \nabla J(w) is Lipschitz continuous with constant L > 0 (i.e., ||\nabla J(w) - \nabla J(w') ||_2 \le L || w - w' ||_2 for any w, w').

1 Like

Thank you very much, this is exactly the answer I needed

1 Like