Parameters Diverging When Learning Rate is too Large

Nathan_Subrahmanian · June 6, 2023, 7:14pm

Hey, everyone! I am trying to better understand Andrew’s diagrams from “Choosing the learning rate” at around 1:36. It seems to indicate that, if the learning rate is too large, w1 might diverge from the value that would minimize the cost function, but I don’t understand how this is possible. If the learning rate is too large, I understand that it could overshoot, but I still think it would get closer every iteration to the optimal value of w1. For example, if the optimal value of w1 is 0, but w1 = 5, it might bounce from 5 → -4 → 4 → -3 → … around 0. Am I missing something here? The diagrams seem to indicate that it could go something like this: 5 → -7 → 12 → -20 → … Thanks everyone in advance!

TMosh · June 6, 2023, 9:40pm

If the learning rate is too large, it might go from 5 → 6, and then 6 → 7, and repeat forever toward infinity.

Topic		Replies	Views
Question regarding learning rate graph from W2 logistic regression lab Neural Networks and Deep Learning	3	630	July 28, 2023
Learning Rate - C1_W2_Lab03 Supervised ML: Regression and Classification week-2	6	532	April 26, 2023
Does larger (not too large) learning rates always converge faster? Supervised ML: Regression and Classification week-2	2	485	February 2, 2023
Why feature scaling can make the learning rate large? Supervised ML: Regression and Classification	8	509	July 15, 2022
MLS C1 W1 About the Learning rate course Supervised ML: Regression and Classification week-1	4	578	July 15, 2022

Parameters Diverging When Learning Rate is too Large

Related topics