Learning rate alpha (Gradient descent)

In which cases do we use a fixed learning rate ( as a parameter ) and in which cases do we use a range for learning rate ( as a hyper parameter) ?

Thanks !

Hi there!

Which course/week are you refering to? I can ad this question in the correct category and our wonderful Mentors can help you with any doubts you might have.

Hi there!

I am done with the ML course. Its a general question that popped up.
But this topic comes up in course 1, week 2 & course 3, week 2.

Thanks !

In this course, we generally use a fixed learning rate.

You could use a range of rates if you wanted to find the best one.

Note that the fixed-rate method is mathematically very inefficient, so we prefer to use better optimizers like those found in TF or scikit learn.

Hey @Jagadish_Blr.

We can choose to tune or not to tune a hyper-parameter, but Learning Rate is always a hyper-parameter.

If you are experienced, you probably would have known a good learning rate to start with, and in that case you don’t need to tune it. With a good learning rate, you should get an improving learning curve. Otherwise, you probably need to tune it to find a good learning rate that will give you an improving learning curve.

As Tom pointed out, there are various optimizers which will adjust the de facto learning rate over the course of the training process. These optimizers, however, require a starting learning rate which is the hyper-parameter you can choose to tune or not. Of course, depending on the optimizer itself, there will be additional hyper-parameters that affect how the de facto learning rate is changing over the training process. All of these hyperparameters often come with default values when you call them from Tensorflow, but they are up to you to fine tune. When you have more experience in a certain problem domain and in handling a certain size of data, you will find yourself less needed to tune some hyper-parameters including the starting learning rate. By “less needed”, it can be, for example, “tuning learning rate in a smaller range” or “tuning of learning rate is lower prioritized”.


Thanks Tom and Raymond.