Interpretation of this learning rate finder plot

I have the following plot relating to learning rate finder results (following the principles of Smith (2015)), made according to this code example and found the loss drop section to be very narrow, what does that mean? The optimal learning rate found is 1.5655376e-05 and the batch size wrt code that was executed is 512.

NEW-PLOT-1

Now I made the same plot using Plotly to see it better, as follows, I zoomed in on the section of interest, and I verified that the optimal learning rate is not even in the section with an abrupt drop, it is located much earlier (trying to understand why the optimal learning rate chosen is far from the abrupt drop).

I can’t really see your images clearly, and you’re uploaded them to a 3rd party site that my browser is blocking.

Please just paste your images into the thread directly.

Also, please update your plot format to include markers for each data point.

@TMosh Hi, ok, I made the adjustments you requested by updating my post. Thanks!

Thanks.

First I doubt that testing the learning rate with such fine increments is really necessary. It’s not that critical to have an exact optimum value. Just getting close is good enough.

Typically you’d use an approximate log range of values using ratios of 1:3:10.
So maybe use 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0. If the data set is normalized, that should cover all the likely values.

Regarding your plots and why the best learning rate doesn’t line up with the step function in the cost plot, I suspect there’s an error in your code.

Note that normalizing the dataset is a critical first step. Doing this will make it a lot easier to find the best learning rate.

And remember, you’re looking for “good enough”. There should be a big range of learning rate values that give just about identical performance.

What do you mean exactly by testing the LR with such fine increments? Would it be possible for you to reference what you are saying in relation to the code I provided?

Do you mean to adjust these log scales in relation to simply viewing the values on the plot? And yes, I am normalizing my data set.

Are you referring to a possible problem in the code only in relation to the red line that denotes the optimal learning rate?

Ok, I’m normalizing the data set. Yes, being good enough would be enough, of course.

Thanks!

Your plot shows that you’re testing with learning rates that are spaced at approximately 0.00000003 units.

Interval: 15.61e-6 - 15.58e-6 = 3.0e-8

I think that is not necessary or useful.

Yes. Start by only using these learning rates:
0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0

Once you identify the interval where the minimum lies, you might then test again using a few points within that interval.

Could be either.

Ok, I understand now.

Right, and how do I adjust this in the code? Could you help me, please?

Ok. Could you please help me check this problem in the code?

Thanks!

Updating your code is best done by yourself. I’m not familiar with your project code, and as a volunteer, I’m not inclined to do it for you.

Well, I thought this was a help forum, and I didn’t ask you to update my code, just to help me. I didn’t understand the sentence about the code update is better if it is done by myself. If I don’t understand, I should seek help, right? How am I going to solve something If I don’t know what to do?

It’s ok if you’re not familiar with the code and that you’re not willing to help, thanks!

This is a forum for questions. I’m not available to work on your code.

Perhaps someone else from the community will be able to help.

We don’t know that code either: it’s something new you just presented to us. And I guess what you are saying is that you don’t understand that code and you’d like us to do the work to understand it and then explain it to you or perhaps even debug it for you. You may get lucky and there is someone listening here who is sufficiently curious about the topic that they are willing to do that work for you. Please realize that the mentors are just volunteers here. We don’t get paid to answer questions. So when you ask a question about something that is beyond the scope of any of the courses here, we have a choice of whether we want to spend our time in that way.

You gave a link to the paper on which this work is based. Have you read the paper?

Yes, I know it’s for questions, and automatically, it’s a help forum. And again I say that there is no problem with you not being available to help me, I just found it strange the way you talked about how updating the code is better if done by myself, especially because if I knew how to update the code I wouldn’t be asking for help. Thanks.

I took a quick look at the paper. I didn’t read every line, but my take is that what they are saying is not that there is an optimal learning rate, but that they got better training results in fewer iterations by varying the LR in a range in a triangular pattern. Meaning that they raise it and lower it alternately. So it’s unclear to me what they mean by that code implementing something from the paper to find an optimal learning rate.

But this was with less than 10 minutes of effort, so I’m very likely missing the point …

“to do the work”? Look, I’m trying to get help, I’m not saying you have to help me. So does this mean that in your opinion, a person who wants help wants someone to do the work for them? I know that no one is paid here, and no one is obligated to help, I’m not complaining that they’re not helping me, I’m just commenting on the way I’m being answered to.

Of course, at some point did I say that someone is obligated to help me?

Ok, thanks. Check this link for example for more info about learning rate finder: Learning Rate Finder — PyTorch Lightning 1.4.9 documentation.

1 Like

Yes, the problem is not understand the paper, but the code that is based on the paper.