Why feature scaling can make the learning rate large?

rmwkwok · July 6, 2022, 10:36am

I love this slide. It is in the C1 W2 video for Feature scaling part 1 at time 6:12

So here the problem of unnormalized features is that your update is more susceptible to overshoot, and from the top right plot, the problem lies in the feature w_1 because it is always the horizontal component of the update arrow that needs to go back and forth.

That’s why you need to choose a small learning rate so that the horizontal component won’t overshoot (it won’t pass beyond the optimal w_1 each time it gets updated). However, the smaller the learning rate is, the slower the update for w_2 will be too, because we have one learning rate for everyone. And if you look at the top right plot again, using a smaller learning rate will makes the update in the w_2 direction very slow too.

The perfect sceneraio is for both weights to get to the optimal values at the same time, so we love the bottom right version.

Raymond

Topic		Replies	Views
Optional Lab: Feature Engineering and Polynomial Regression (Feature scaling impact on Convergence) Supervised ML: Regression and Classification week-2	2	532	September 2, 2022
Is my understanding of Feature Scaling correct? Supervised ML: Regression and Classification week-2	3	529	August 12, 2022
About gradient descent and Features scaling Supervised ML: Regression and Classification week-2	6	553	August 19, 2022
The relation between scaling and learning rate Supervised ML: Regression and Classification week-2	3	536	March 27, 2023
Feature Scaling - When to Scale Supervised ML: Regression and Classification week-2	3	496	July 13, 2023

Why feature scaling can make the learning rate large?

Related topics