Because when I change the scale of the functions, the algorithm converges faster

LR0205 · March 19, 2024, 6:47am

Hello, could someone explain to me mathematically why when I change the scale of the features, the gradient descent algorithm converges faster.

gent.spah · March 19, 2024, 8:52am

Fluctuations on the down trending path of the gradient decent toward the optima become smaller and the chance to overshot and by pass the optima is smaller when the features are scaled.

In the deep learning specialization Prof Andrew explains it quite nicely though, check it out.

Topic		Replies	Views
How does scaling makes the gradient decent faster? Advanced Learning Algorithms week-2	3	551	July 10, 2022
The relation between scaling and learning rate Supervised ML: Regression and Classification week-2	3	534	March 27, 2023
About gradient descent and Features scaling Supervised ML: Regression and Classification week-2	6	553	August 19, 2022
Optional Lab: Feature Engineering and Polynomial Regression (Feature scaling impact on Convergence) Supervised ML: Regression and Classification week-2	2	530	September 2, 2022
Can someone help explain mathematically why normalizing inputs could improve convergence speed in gradient descent? Improving Deep Neural Networks: Hyperparameter tun week-1	1	26	January 10, 2025

Because when I change the scale of the functions, the algorithm converges faster

Related topics