MLS C1 W1 About the Learning rate course

yhuai_lin · June 26, 2022, 6:53pm

Screenshot 2022-06-26 at 20.51.55

Can someone help me to understand why is it that every time the level of increase/decrease of w gets larger and lager in this example? (the pink line)

shanup · June 26, 2022, 11:01pm

Hello @yhuai_lin

Welcome to the community

This is a situation that happens when we choose a high value for the learning rate \alpha

The aim of gradient descent is to keep updating the parameters w and b until we reach the Cost minima. Whenever, we set a high value for \alpha, instead of w taking small steps and moving closer to the minima, it takes bigger steps thereby overshooting the minima.

With reference to the figure shown above, the initial value of w is at the lowest point on the left of the minima. The next update should have brought w a step closer to the minima. However, due to the high value of \alpha, the update makes w overshoot the minima, thereby ending up on the right of the minima. The next update of w again aims to bring it closer to the minima, but in trying to do so, gives it an even larger update and pushes it further away and to the left of the minima. As w keeps getting pushed farther and farther away from the minima, the corresponding cost J(w) increases as well. This is why you see the arrows climbing up on the cost curve.

With every subsequent update of w the following behaviour can be noted:

w keeps bouncing around from one side of the minima to the other
the magnitude of increase of w increases with each step.

The update value of w for each step of Gradient descent depends on:

\alpha
\dfrac{dJ}{dw}

The value of \dfrac{dJ}{dw} is smaller for points on the cost curve near the minima and keeps increasing as we move away from the minima. In the example above, as each update of w pushes it further away from the minima, correspondingly we move higher up on the cost curve and hence the value of \dfrac{dJ}{dw} at the new value of w is higher than it was at the previous step - Consequently, the magnitude by which w gets updated in the next step will increase compared to the previous step.

We can control this unbounded increase of w and cost J(w) for each step of Gradient Descent by setting an appropriately low value for the learning rate \alpha. Prof. Andrew covers this in detail in Week 2 of Course 1

szgergoke · July 7, 2022, 4:43pm

The interesting thing is that we don’t know how much we need to update w to get closer to the minima, we only know the direction of the next update.
Here, we not only overshot the minima right in the first step, but we overshot it further away, than our distance was to the left from the minima before. Just imagine a green marker on the horizontal axis, similar to the blue ones, just below the minimum point. The distance between our imaginary green marker and the right marker (where we jumped) is greater than the distance between the left blue marker (our original position) and the green marker.
This unfortunately this means a larger gradient value of the cost function, considering its absolute value, and thus a larger next jump. And the story goes on like this, the level of increase/decrease gets larger and larger, because of the larger and larger absolute gradient value.

yhuai_lin · July 15, 2022, 9:30pm

Thanks a lot for the explanation!

shanup · July 15, 2022, 9:53pm

You are most welcome @yhuai_lin

Topic		Replies	Views
Why it will overshoot and never reach the minimum? (The point getting away from lowest point?) Supervised ML: Regression and Classification week-module-1	4	40	November 11, 2024
How could a large learning rate cause constantly increasing cost Supervised ML: Regression and Classification week-module-2	5	453	September 12, 2023
Practise Quiz Train the model with gradient descent Q1 Supervised ML: Regression and Classification week-module-1	3	524	November 17, 2022
Week 2 : Supervised Machine Learning: Regression and Classification Supervised ML: Regression and Classification	11	732	January 7, 2024
Can the gradient descent converge even if the learning rate set large and fixed Supervised ML: Regression and Classification week-module-1	1	496	August 6, 2022

MLS C1 W1 About the Learning rate course

Related topics