Advice for selecting the right value for alpha in Gradient Descent

alice.m · January 6, 2023, 10:15pm

Hello!

The learning rate (alpha) controls the magnitude of the step the cost function derivative takes when trying to find the minimum cost. My understanding is that when alpha is too large, the cost function will diverge. When it’s too small, it will require many more steps than is needed to reach convergence. In practice, what values for alpha do you start with and how do you vary that value in order to find an ‘optimum alpha’ value?

P.S. I read the posts on hyper-paramteter tuning for Lambda (regularization parameter) but I couldn’t find a post for the Alpha (learning rate) paramater.

Thank you for a fantastic course!

Sincerely,
Alice

RyanCarr · January 6, 2023, 10:46pm

Hey @alice.m
It’s one of those parameters we need to play around with to figure out the sweet spot. This is where outputting the cost every so often is helpful. It helps you gauge if you’re converging fast enough or not.

AbdElRhaman_Fakhry · January 6, 2023, 10:49pm

HI @alice.m

Welcome to the community

There aren’t a rule that control what the intial value of alpha should be but personally I found that many and many projects alpha starts with 0.1 or 0.01 rarely it is 1.2 and in this case the values of W & b (Weights) and the training examples values was so big but if you want to start with a big value of alpha and after some iteration the value decay by specific rate you can read about learning rate decay(or Learning rate scheduling) it is a method from many methods that change the value of alpha after some iterations and also my advice is to try to apply different values from alpha starting from 0.1 and if you found that your model is slowly or converge after many iteration start to increase it by small number until you found that the model run rapidly and the model converge and vice verse

Cheers!
Abdelrahman

alice.m · January 6, 2023, 11:16pm

Thank you very much, Abdelrahman!

I haven’t hit the lessons on Adaptive Learning Rates yet but your feedback makes total sense. After reading through a few Quora/Reddit posts, I see that most of the posts start the learning rate at 0.1 or 0.01 and then apply different types of adaptive learning rate models (step, time-based, exponential decay). I’ll need to read more about this but I really appreciate your feedback in guiding me to learn more about how this parameter is tuned. It sounds like the number of layers and neurons you use in the neural model, the size of the data values, etc all influence the initial value for alpha as well as the type of adaptive model you select to tune it.

TMosh · January 6, 2023, 11:39pm

If you do not have normalized features, then there’s no telling what an optimum learning rate might be.

If the features are normalized, then it’s a good bet that the best learning rate will be < 1.0.

For a broad evaluation, you can use a ratio of 1:3:10 for the rate increments - that is an easy approximation of a log progression.

So you might use a sequence of [0.01, 0.03, 0.1, 0.3, 1.0] and see how it works. For a simple linear or logistic regression, the solution isn’t very sensitive to optimizing the learning rate. Just find one that doesn’t cause divergence, then keep increasing the number of iterations as necessary.

Note that if you have an NN, you’re probably not going to be using fixed-rate gradient descent to find the solution. There are much more computationally efficient tools available, you’ll learn about those during the course.

pastorsoto · January 7, 2023, 2:43pm

Hi @alice.m I just want to add there are techniques that help you find the best hyperparameter for your dataset and model. So, it would be worth exploring more on how to use those tools once you have a good understanding of how to do it manually.

This article can help you to get started.

Topic		Replies	Views
Optional Lab: C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln Supervised ML: Regression and Classification week-2	2	508	October 10, 2022
Gradient descent convergence_test Supervised ML: Regression and Classification week-2	2	532	July 15, 2022
In the Lab, Alpha is lower than discussed in video Supervised ML: Regression and Classification week-2	2	397	June 25, 2023
MLS-c1-week1 - Video Learning Rate-Training Linear Regression Supervised ML: Regression and Classification week-1	6	518	February 20, 2023
C1_W1_Lab05 (Learning rate, and number of iterations) Supervised ML: Regression and Classification week-1	1	526	August 6, 2022

Advice for selecting the right value for alpha in Gradient Descent

Related topics