Gradient descent learning rate

I have doubt in Gradient descent week2. How do we assume learning rate ?. In video " Optimization using Gradient Descent in one variable - Part 2". Prof says to assume learning rate as 0.005. In reality as in solving the problems how do we assume what should be the learning rate ?. I’m kind of stuck here.

1 Like

The learning rate determines the size of the steps taken during optimization, and it plays a significant role in the convergence and performance of the model. It’s essential to note that the optimal learning rate can depend on the specific problem, architecture, and dataset. Therefore, it’s often a good idea to experiment with different approaches and choose the one that works best for your particular scenario.

Some common strategies for selecting a learning rate that you can read more about in internet are:

Adaptive Learning Rate Methods:

  • Adaptive learning rate methods, such as Adam, RMSprop, and Adagrad, dynamically adjust the learning rate during training based on past gradients. These methods can adapt to the geometry of the loss landscape.

Learning Rate Schedules:

  • Instead of using a fixed learning rate throughout training, you can use learning rate schedules. This involves starting with a higher learning rate and gradually reducing it as training progresses.
  • Common learning rate schedules include step decay, exponential decay, and 1/t decay, where t is the iteration or epoch number.

Grid Search:

  • You can perform a grid search over a range of learning rates. Train your model with different learning rates and evaluate their performance. Choose the learning rate that gives the best results.
  • Typically, you might try values like 0.1, 0.01, 0.001, etc., and observe how the model performs.

Happy learning,


1 Like

The learning rate is determined by experiment.

  • If it’s too large, then the cost solution will diverge toward +Infitity.
  • If it’s too small, then convergence will take too long.
1 Like

Thank you guys :slight_smile: