Does larger (not too large) learning rates always converge faster?

Nima1313 · February 2, 2023, 7:45am

Hello everyone!
I was playing around with optional lab : Feature scaling and learning rate, and I found something interesting.
There were two learning rates in the lab that worked for the given examples and they were able to converge. But the interesting part was that the cost function for the smaller learning rate actually decreased faster with a small number of iterations. So I increased the iterations by 100 times and then the cost function of the larger learning rate became smaller than the one of smaller learning rate.
So I was wondering, is there an example that if we have two learning rates that both converge, can smaller one converge faster?

rmwkwok · February 2, 2023, 11:15am

Hello @Nima1313!

Welcome to our community!! It seems you were having fun, and from your description, you had actually discovered one such example, hadn’t you? It is completely possible that given the same number of iterations, a smaller learning rate can perform better than a larger learning rate even through they both converge.

You have cleverly excluded the “too large” case, but if we do a thought experiment by considering a “pretty large” case that it is marginally converging (meaning that if it is a little bit larger, then it diverges), then it will still have a pretty hard time converging to an optimum because it is still pretty large. Well, I am too lazy to actually carry out the real experiment, but again, from your description, you have found that.

Keep trying, and cheers,
Raymond

shanup · February 2, 2023, 12:55pm

Hello @Nima1313

Since you are on an adventure, there is 1 more thing you could look at.

As per your experiment the smaller learning rate case has converged faster than the larger learning rate case - I assume this is what you saw.

If so, you should check out if the case with higher learning rate is bouncing around - This can be verified by checking if the derivative is changing sign for each update/iteration.

Topic		Replies	Views
Optional Lab: Feature scaling and Learning Rate , effect of alpha Supervised ML: Regression and Classification week-2	3	541	August 14, 2022
Learning rate on Regularization Supervised ML: Regression and Classification week-3	5	338	December 21, 2023
Question regarding learning rate graph from W2 logistic regression lab Neural Networks and Deep Learning coursera-platform	3	653	July 28, 2023
Optional Lab: Feature Engineering and Polynomial Regression (Feature scaling impact on Convergence) Supervised ML: Regression and Classification week-2	2	532	September 2, 2022
Learning Rate - C1_W2_Lab03 Supervised ML: Regression and Classification week-2	6	536	April 26, 2023

Does larger (not too large) learning rates always converge faster?

Related topics