How does scaling makes the gradient decent faster?

Sepehr_Razavi · July 10, 2022, 11:14am

I actually understand the things that are said in the course, which concerns with taking fewer steps when our contour plots are more similar to circles. But I want to know if there’s any mathematical explanations for this statement or it is only derived from comparing the results of a scaled data set and non scaled one?
Thanks.

rmwkwok · July 10, 2022, 11:40am

Hi @Sepehr_Razavi,

If you don’t normalize and features have very different scales, you can still optimize the model given that you use a sufficiently small learning rate and that’s the cause of taking more steps - smaller learning rate, more steps.

For why we need a smaller learning rate, I think it’s easier to visualize it. This discussion used a slide to explain this…

Raymond

Sepehr_Razavi · July 10, 2022, 3:14pm

Thank you very much

rmwkwok · July 10, 2022, 3:15pm

You’re welcome @Sepehr_Razavi

Topic		Replies	Views
About gradient descent and Features scaling Supervised ML: Regression and Classification week-module-2	6	568	August 19, 2022
The relation between scaling and learning rate Supervised ML: Regression and Classification week-module-2	3	541	March 27, 2023
Interpreting the benefits of feature scaling Supervised ML: Regression and Classification week-module-1	18	622	February 9, 2023
Graph in optional lab : feature scaling and learning rate Supervised ML: Regression and Classification week-module-2	2	501	March 3, 2023
Because when I change the scale of the functions, the algorithm converges faster AI Discussions ai-discussions	1	65	March 19, 2024

How does scaling makes the gradient decent faster?

Related topics