Why normalization helps

Divyaman_Singh_Rawat · July 20, 2023, 7:05am

Hello,

So, in the lecture, without normalization, the contour plot had an elongated shape. So, during gradient descent, we would proceed in a direction that is perpendicular to the contour lines. However, because of the elongated shape, that direction is unlikely to lead us directly to the minimum of the cost function. Therefore, convergence can take much longer.

However, after regularization, the contour map becomes circular. Gradient descent will still take us in a direction that is perpendicular to the contour lines, but this time it will directly lead to the minimum.

Is this understanding correct?

Regards,
Divyaman

balaji.ambresh · July 20, 2023, 8:36am

When the contour has an elongated shape, we have to use a smaller learning rate since the updates can be bumpy and so will take a longer time to converge.
When we have circular contours, no matter where the gradient descent starts from, convergence will be faster and we can use a higher learning rate when compared to the previous scenario.

Gradient descent weight updates are perpendicular to the contours in both cases. See this visualization as well.

For a practical example, see this

Divyaman_Singh_Rawat · July 20, 2023, 9:20pm

Thank you! I understand now!

Regards,
Divyaman

Topic		Replies	Views
Can someone help explain mathematically why normalizing inputs could improve convergence speed in gradient descent? Improving Deep Neural Networks: Hyperparameter tun week-1	1	33	January 10, 2025
Gradient descent with momentum Improving Deep Neural Networks: Hyperparameter tun	3	571	August 15, 2022
Week 1: Question about video titled, "Why Normalize?" Improving Deep Neural Networks: Hyperparameter tun	1	501	January 3, 2022
About gradient descent and Features scaling Supervised ML: Regression and Classification week-2	6	553	August 19, 2022
Course 2 -- Week 1 -- Normalizing inputs Improving Deep Neural Networks: Hyperparameter tun	5	545	June 29, 2021

Why normalization helps

Related topics