The relation between scaling and learning rate

rmwkwok · March 27, 2023, 1:06am

You might have watched the lecture for “Feature scaling part 1” in Course 1 Week 2, but sometimes watching a lecture again in a different timing can give learner a different angle to come up with a working understanding. Here is a slide from that lecture that is most relevant to your question, and in particular, the red arrows which depicts that non-scaled features gave it a difficult time to converge whereas scaled features provided a much more “direct path” to the optimal solution.

As Tom has also explained, with unscaled features, we need to pick very carefully a small enough learning rate to avoid it to diverge in the dimension of small-scale feature (size in feet2). For example, if we look at the upper right graph and along the w_1 direction, we need to make a “walking” step to be around something like 0 ~ 0.2 in order for it not to diverge. The step size is controlled by the learning rate and it has to be small enough to not amplify the step out of acceptable range.

Such small learning rate, however, is not in favor of the w_2 direction (# bedroom) which spans a larger range from 0 ~ 100. A reasonable step for w_2 is likely something 0~20 which is 100 times larger than the acceptable range for w_1. Therefore, under the limitation that both directions use the same learning rate, while a small learning rate lets us walk with reasonable step size in w_1 direction, it is too small for w_2 and because of that, it takes “more time” (or more steps) for w_2 to converge.

If we then look at the bottom right graph which has both features scaled to the same range, now both directions accept a similar step size, therefore, one direction does not need to walk slower to “accomodate” the other direction.

To echo what I have said in the beginning, re-watch the lecture if you have time

Cheers,
Raymond

Topic		Replies	Views
Optional Lab: Feature Engineering and Polynomial Regression (Feature scaling impact on Convergence) Supervised ML: Regression and Classification week-2	2	531	September 2, 2022
Is my understanding of Feature Scaling correct? Supervised ML: Regression and Classification week-2	3	526	August 12, 2022
About gradient descent and Features scaling Supervised ML: Regression and Classification week-2	6	553	August 19, 2022
The Effect of Feature Rescaling on Convergence Supervised ML: Regression and Classification week-2	1	488	July 27, 2022
Why feature scaling can make the learning rate large? Supervised ML: Regression and Classification	8	512	July 15, 2022

The relation between scaling and learning rate

Related topics