Learning Rate - C1_W2_Lab03

Hi, I have a question regarding example mentioned in the 3rd optional lab in week 2.
For 𝛼= 9e-7, the value of w_0 oscillate around the minimum, and the cost function didn’t get below 40000 till the 6th iteration.
For 𝛼 = 1e-7, the value of w_0 didn’t oscillate, and the cost function is decaying much faster.
However the notes states different " On the left, you see that cost is decreasing as it should. On the right you can see that 𝑤_0 is decreasing without crossing the minimum. Note above that dj_w0 is negative throughout the run. This solution will also converge, though not quite as quickly as the previous example."
So I am confused now. Could you explain more for me?

Hi @Alaa_Elshorbagy
Welcome to the community!

In this assignment show you how the best learning rate 𝛼 will behavie, how the cost would be,also the change value of each iteration what it would be
First what is the learning rate: the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function .
In the image below learning rate is 𝛼 = 9e-7 :

The changed value(derivative of cost according to weight w0) it change(oscillate) one positive, one negative like what I highlighted it always converge and it’s very speed, the learning rate here is very good because:

  • The cost converge(decrease) in every iteration
  • The rate(speed) of converge is so good
  • Wouldn’t diverge after some iteration

In the image below the learning rate is 𝛼 = 1e-7 it’s a bit smaller value from the previous learning rate alghtough the cost converge(decrease) in every iteration, But the speed of converge isn’t as good as the above image

So breifly we search about the value of learning rate 𝛼 which:

  • Always converge in every iterations
  • The speed of converge is the highest
  • Would diverge after some iteration

Note that there are a great techniques called learning rate decay(or learning rate schedule) which is after some iterations the learning rate start to decrease to fit the small change we will make to the weights


Thank you Abdelrahman for your reply. How do you know if the speed/rate of convergence is good?

As you can see below the learning curve is flattening by the 60 iteration:

While here, the learning curve flatten by the 10th iteration, which mean it is faster!! what am I getting wrong?


I am confused about this as well. Were you able to figure out the problem? Because that statement did not make sense to me when I read it. (“though not as quickly as the previous example”).

1 Like

When alpha was more, it decreased rapidly but didn’t reach minima quickly as it got skipped as it were giant steps due to which although it was decreasing rapidly, it didn’t reach minima soon as it had to oscillate at the same place many times. While when there is less alpha, although it decreases slowly but reaches minima in the first go which it took comparatively lesser time

1 Like

I was also confused by the notes on the optional lab. But I understood what they mean with this reply so thank you :slight_smile:
If I got it correctly, the decreasing velocity is given by djdw0 for this example and when it oscillates (alpha=9e-7), we see that this values is larger for every iteration than when it does no oscillate (alpha=1e-7).
However, we are not only looking for the quickest decreasing velocity, but also for the fastest convergence. Therefore, although alpha=9e-7 gives us a faster decreasing rate, alpha=1e-7 converges faster to the minimum.
Am I right?
If that is the case I think that the sentence “This solution will also converge, though not quite as quickly as the previous example.” in the notes is quite confusing.
Thank you!

Yes, it may seem to be a bit confusing in beginning, you have understood correctly.