Interpretation of this learning rate finder plot

marcocintra · April 23, 2024, 5:23pm

I have the following plot relating to learning rate finder results (following the principles of Smith (2015)), made according to this code example and found the loss drop section to be very narrow, what does that mean? The optimal learning rate found is 1.5655376e-05 and the batch size wrt code that was executed is 512.

NEW-PLOT-1

Now I made the same plot using Plotly to see it better, as follows, I zoomed in on the section of interest, and I verified that the optimal learning rate is not even in the section with an abrupt drop, it is located much earlier (trying to understand why the optimal learning rate chosen is far from the abrupt drop).

TMosh · April 23, 2024, 5:50pm

I can’t really see your images clearly, and you’re uploaded them to a 3rd party site that my browser is blocking.

Please just paste your images into the thread directly.

TMosh · April 23, 2024, 5:51pm

Also, please update your plot format to include markers for each data point.

marcocintra · April 23, 2024, 6:17pm

@TMosh Hi, ok, I made the adjustments you requested by updating my post. Thanks!

TMosh · April 23, 2024, 6:33pm

Thanks.

First I doubt that testing the learning rate with such fine increments is really necessary. It’s not that critical to have an exact optimum value. Just getting close is good enough.

Typically you’d use an approximate log range of values using ratios of 1:3:10.
So maybe use 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0. If the data set is normalized, that should cover all the likely values.

Regarding your plots and why the best learning rate doesn’t line up with the step function in the cost plot, I suspect there’s an error in your code.

TMosh · April 23, 2024, 6:36pm

Note that normalizing the dataset is a critical first step. Doing this will make it a lot easier to find the best learning rate.

And remember, you’re looking for “good enough”. There should be a big range of learning rate values that give just about identical performance.

marcocintra · April 23, 2024, 6:45pm

What do you mean exactly by testing the LR with such fine increments? Would it be possible for you to reference what you are saying in relation to the code I provided?

Do you mean to adjust these log scales in relation to simply viewing the values on the plot? And yes, I am normalizing my data set.

Are you referring to a possible problem in the code only in relation to the red line that denotes the optimal learning rate?

Ok, I’m normalizing the data set. Yes, being good enough would be enough, of course.

Thanks!

TMosh · April 23, 2024, 8:56pm

Your plot shows that you’re testing with learning rates that are spaced at approximately 0.00000003 units.

Interval: 15.61e-6 - 15.58e-6 = 3.0e-8

I think that is not necessary or useful.

TMosh · April 23, 2024, 8:57pm

Yes. Start by only using these learning rates:
0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0

Once you identify the interval where the minimum lies, you might then test again using a few points within that interval.

TMosh · April 23, 2024, 9:00pm

Could be either.

marcocintra · April 23, 2024, 9:40pm

Ok, I understand now.

Right, and how do I adjust this in the code? Could you help me, please?

Ok. Could you please help me check this problem in the code?

Thanks!

TMosh · April 23, 2024, 10:08pm

Updating your code is best done by yourself. I’m not familiar with your project code, and as a volunteer, I’m not inclined to do it for you.

marcocintra · April 23, 2024, 10:14pm

Well, I thought this was a help forum, and I didn’t ask you to update my code, just to help me. I didn’t understand the sentence about the code update is better if it is done by myself. If I don’t understand, I should seek help, right? How am I going to solve something If I don’t know what to do?

It’s ok if you’re not familiar with the code and that you’re not willing to help, thanks!

TMosh · April 23, 2024, 11:00pm

This is a forum for questions. I’m not available to work on your code.

Perhaps someone else from the community will be able to help.

paulinpaloalto · April 23, 2024, 11:16pm

We don’t know that code either: it’s something new you just presented to us. And I guess what you are saying is that you don’t understand that code and you’d like us to do the work to understand it and then explain it to you or perhaps even debug it for you. You may get lucky and there is someone listening here who is sufficiently curious about the topic that they are willing to do that work for you. Please realize that the mentors are just volunteers here. We don’t get paid to answer questions. So when you ask a question about something that is beyond the scope of any of the courses here, we have a choice of whether we want to spend our time in that way.

You gave a link to the paper on which this work is based. Have you read the paper?

marcocintra · April 23, 2024, 11:19pm

Yes, I know it’s for questions, and automatically, it’s a help forum. And again I say that there is no problem with you not being available to help me, I just found it strange the way you talked about how updating the code is better if done by myself, especially because if I knew how to update the code I wouldn’t be asking for help. Thanks.

paulinpaloalto · April 23, 2024, 11:26pm

I took a quick look at the paper. I didn’t read every line, but my take is that what they are saying is not that there is an optimal learning rate, but that they got better training results in fewer iterations by varying the LR in a range in a triangular pattern. Meaning that they raise it and lower it alternately. So it’s unclear to me what they mean by that code implementing something from the paper to find an optimal learning rate.

But this was with less than 10 minutes of effort, so I’m very likely missing the point …

marcocintra · April 23, 2024, 11:28pm

“to do the work”? Look, I’m trying to get help, I’m not saying you have to help me. So does this mean that in your opinion, a person who wants help wants someone to do the work for them? I know that no one is paid here, and no one is obligated to help, I’m not complaining that they’re not helping me, I’m just commenting on the way I’m being answered to.

Of course, at some point did I say that someone is obligated to help me?

marcocintra · April 23, 2024, 11:28pm

Ok, thanks. Check this link for example for more info about learning rate finder: Learning Rate Finder — PyTorch Lightning 1.4.9 documentation.

marcocintra · April 23, 2024, 11:29pm

Yes, the problem is not understand the paper, but the code that is based on the paper.

Topic		Replies	Views
Selecting optimal learning rate from Learning rate scheduler Sequences, Time Series and Prediction week-module-2	1	15	February 12, 2025
Dynamic adjustment of the learning rate Supervised ML: Regression and Classification week-module-2	2	613	August 27, 2022
Some fun graphs derived from the Week 2 Programming Assignment Neural Networks and Deep Learning week-module-2 , coursera-platform	43	122	February 21, 2025
Optional Lab C1_W2_Lab03 - Learning rate choice Supervised ML: Regression and Classification week-module-2	5	495	October 5, 2022
Question regarding learning rate graph from W2 logistic regression lab Neural Networks and Deep Learning coursera-platform	3	661	July 28, 2023

Interpretation of this learning rate finder plot

Related topics