Why normalization helps

balaji.ambresh · July 20, 2023, 8:36am

When the contour has an elongated shape, we have to use a smaller learning rate since the updates can be bumpy and so will take a longer time to converge.
When we have circular contours, no matter where the gradient descent starts from, convergence will be faster and we can use a higher learning rate when compared to the previous scenario.

Gradient descent weight updates are perpendicular to the contours in both cases. See this visualization as well.

For a practical example, see this

Topic		Replies	Views
Can someone help explain mathematically why normalizing inputs could improve convergence speed in gradient descent? Improving Deep Neural Networks: Hyperparameter tun week-1	1	33	January 10, 2025
Gradient descent with momentum Improving Deep Neural Networks: Hyperparameter tun	3	571	August 15, 2022
Week 1: Question about video titled, "Why Normalize?" Improving Deep Neural Networks: Hyperparameter tun	1	501	January 3, 2022
About gradient descent and Features scaling Supervised ML: Regression and Classification week-2	6	553	August 19, 2022
Course 2 -- Week 1 -- Normalizing inputs Improving Deep Neural Networks: Hyperparameter tun	5	545	June 29, 2021

Why normalization helps

Related topics