Optimising the learning rate alpha

Has anyone ever considered making the learning rate alpha adaptive to optimise convergence of the cost function to a global mininum?

Its occured to me that if initial values of the parameter vector w can be determined by first finding w which make the cost function a maximum then setting alpha to a high value like 0.8. Then as the cost function converges quickly to the global minimum with a large alpha, compute the derivative vector and reduce alpha as the absolute value of the derivative gets smaller. This way, the cost function should reach its global minimum the quickest possible way wthout overshooting the minimum or taking too many iterations.

1 Like

Most APIs like scikit-learn that implement linear/logistic regression have implementations that allow you to set an initial learning rate and to tweak the learning rate ‘schedule’ for gradient descent. Your intution is correct for convex loss functions with a distinct optimal solution - initially the weights are ‘far away from the optimal weights’, so the weight updates may take us closer to the optimal weights if the learning rate is set to a higher value (as long as you don’t overshoot the optima after the first weight update). However, the intuition is not valid if the loss (vs weights) behaves differently. There are other solutions like momentum that perform better on not-so-well-behaved loss functions.

1 Like

What is “scikit-learn”?

Scikit-learn is a python package that’s used to build most machine learning models. The API makes it easy to train models on any data set.

Yes. One method is called the “Adam optimizer”.
https://pytorch.org/docs/stable/generated/torch.optim.Adam.html

1 Like

Thank you. That sounds very interesting.