Hi, I have a question regarding example mentioned in the 3rd optional lab in week 2.

For đť›Ľ= 9e-7, the value of w_0 oscillate around the minimum, and the cost function didnâ€™t get below 40000 till the 6th iteration.

For đť›Ľ = 1e-7, the value of w_0 didnâ€™t oscillate, and the cost function is decaying much faster.

However the notes states different " On the left, you see that cost is decreasing as it should. On the right you can see that đť‘¤_0 is decreasing without crossing the minimum. Note above that `dj_w0`

is negative throughout the run. This solution will also converge, **though not quite as quickly as the previous example.**"

So I am confused now. Could you explain more for me?

A.

Hi @Alaa_Elshorbagy

Welcome to the community!

In this assignment show you how the best learning rate đť›Ľ will behavie, how the cost would be,also the change value of each iteration what it would be

First what is the learning rate: the learning rate is **a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function** .

In the image below learning rate is đť›Ľ = 9e-7 :

The changed value(derivative of cost according to weight w0) it change(oscillate) one positive, one negative like what I highlighted it always converge and itâ€™s very speed, the learning rate here is very good because:

- The cost converge(decrease) in every iteration
- The rate(speed) of converge is so good
- Wouldnâ€™t diverge after some iteration

**But**

In the image below the learning rate is đť›Ľ = 1e-7 itâ€™s a bit smaller value from the previous learning rate alghtough the cost converge(decrease) in every iteration, But the speed of converge isnâ€™t as good as the above image

So breifly we search about the value of learning rate đť›Ľ which:

- Always converge in every iterations
- The speed of converge is the highest
- Would diverge after some iteration

Note that there are a great techniques called learning rate decay(or learning rate schedule) which is after some iterations the learning rate start to decrease to fit the small change we will make to the weights

Cheers,

Abdelrahman

Thank you Abdelrahman for your reply. How do you know if the speed/rate of convergence is good?

As you can see below the learning curve is flattening by the 60 iteration:

While here, the learning curve flatten by the 10th iteration, which mean it is faster!! what am I getting wrong?

Cheers

Alaa

I am confused about this as well. Were you able to figure out the problem? Because that statement did not make sense to me when I read it. (â€śthough not as quickly as the previous exampleâ€ť).

When alpha was more, it decreased rapidly but didnâ€™t reach minima quickly as it got skipped as it were giant steps due to which although it was decreasing rapidly, it didnâ€™t reach minima soon as it had to oscillate at the same place many times. While when there is less alpha, although it decreases slowly but reaches minima in the first go which it took comparatively lesser time

Hi!

I was also confused by the notes on the optional lab. But I understood what they mean with this reply so thank you

If I got it correctly, the decreasing velocity is given by djdw0 for this example and when it oscillates (alpha=9e-7), we see that this values is larger for every iteration than when it does no oscillate (alpha=1e-7).

However, we are not only looking for the quickest decreasing velocity, but also for the fastest convergence. Therefore, although alpha=9e-7 gives us a faster decreasing rate, alpha=1e-7 converges faster to the minimum.

Am I right?

If that is the case I think that the sentence â€ś**This solution will also converge, though not quite as quickly as the previous example.**â€ť in the notes is quite confusing.

Thank you!

Yes, it may seem to be a bit confusing in beginning, you have understood correctly.