Optional Lab: Feature scaling and Learning Rate , effect of alpha

hi everybody, I have a question regarding the comment that the convergence of the second solution is slower than the first one as learning rate is smaller. But when I compared the two graphs at (alpha = 20 ), I notice that the convergence happens faster in the second curve and not the opposite as predicted. Have I missed something ?

Hi @Omar_Mohamad, I am not opening the optional lab. I am just looking at your screenshot, ok?

The upper screenshot has a higher learning rate 9e-7, but on the right graph, you see that it fluctuates between the two sides of the parabola curve, and this is why it is converging slower. This learning rate appears to be larger than appropriate.

The lower screenshot has a lower learning rate 1e-7 and on the right graph, it does not fluctuate but go straight towards the minimum. This is why it’s better in converging and this learning rate is not too large.

I think this is the most important message and we can focus on this. Do you agree with what I have said above?


hello @rmwkwok , I understood from you that the second solution is faster as it goes straight forward without fluctuation. But as it mentioned in the comment of the lab, the first must be faster. However, when I tried what you told me in the question of “slope value”, that I need to increase the learning rate from 20 to 10000, I noticed that the first solution reaches the optimum, but the second don’t. And it seems that I got deceived by the graph, as this tiny part at the bottom of the parabola takes a lot of time from the second solution to move through it.

I see. I forgot about the number of iterations here too. My bad. I should have opened the optional lab :sweat_smile:

But it’s good that you figured it out :+1:


1 Like