Learning rate alpha (Gradient descent)

Jagadish_Blr · October 22, 2022, 2:52pm

In which cases do we use a fixed learning rate ( as a parameter ) and in which cases do we use a range for learning rate ( as a hyper parameter) ?

Thanks !

Community-Team · October 24, 2022, 1:16pm

Hi there!

Which course/week are you refering to? I can ad this question in the correct category and our wonderful Mentors can help you with any doubts you might have.

Jagadish_Blr · October 24, 2022, 2:24pm

Hi there!

I am done with the ML course. Its a general question that popped up.
But this topic comes up in course 1, week 2 & course 3, week 2.

Thanks !

TMosh · October 24, 2022, 10:13pm

In this course, we generally use a fixed learning rate.

You could use a range of rates if you wanted to find the best one.

Note that the fixed-rate method is mathematically very inefficient, so we prefer to use better optimizers like those found in TF or scikit learn.

rmwkwok · October 25, 2022, 12:03am

Hey @Jagadish_Blr.

We can choose to tune or not to tune a hyper-parameter, but Learning Rate is always a hyper-parameter.

If you are experienced, you probably would have known a good learning rate to start with, and in that case you don’t need to tune it. With a good learning rate, you should get an improving learning curve. Otherwise, you probably need to tune it to find a good learning rate that will give you an improving learning curve.

As Tom pointed out, there are various optimizers which will adjust the de facto learning rate over the course of the training process. These optimizers, however, require a starting learning rate which is the hyper-parameter you can choose to tune or not. Of course, depending on the optimizer itself, there will be additional hyper-parameters that affect how the de facto learning rate is changing over the training process. All of these hyperparameters often come with default values when you call them from Tensorflow, but they are up to you to fine tune. When you have more experience in a certain problem domain and in handling a certain size of data, you will find yourself less needed to tune some hyper-parameters including the starting learning rate. By “less needed”, it can be, for example, “tuning learning rate in a smaller range” or “tuning of learning rate is lower prioritized”.

Raymond

Jagadish_Blr · October 25, 2022, 6:05am

Thanks Tom and Raymond.

Topic		Replies	Views
Optional Lab: C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln Supervised ML: Regression and Classification week-module-2	2	511	October 10, 2022
Why learning rate is still considered as the hyper parameter? AI Discussions	14	505	January 9, 2023
Help understanding Learning Rate Scheduler Sequences, Time Series and Prediction week-module-3	3	596	September 13, 2022
About tuning learning rate in c4w4 assignment Sequences, Time Series and Prediction week-module-4	1	552	August 19, 2022
Advice for selecting the right value for alpha in Gradient Descent Supervised ML: Regression and Classification week-module-3	5	1539	January 7, 2023

Learning rate alpha (Gradient descent)

Related topics