DL week2: Gradient Descent with Momentum

Jiacheng_Cao · June 23, 2023, 1:39pm

Deep Learning - week 2 Gradient Descent with Momentum

Can anyone tell me why larger beita leads to less oscillation for the progress to minimum? Thx.

Christian_Simonis · June 25, 2023, 4:25am

Welcome to the community, @Jiacheng_Cao!

With a higher \beta previous gradients are weighted more which means that your gradients are more smooth. Imagine you have some noise or a gradient outlier, then a higher \beta helps to mitigate its impact because the previous gradients are well considered. You can think of a moving average which is used here to smoothen the gradient and therefore reduce oscillations.

Note that if you would increase your \beta even higher, you might have too much of momentum and swing over your actual optimum you want to reach.

Bear in mind that other gradient-based optimization algorithms can be suitable dependent on how your data and the optimization cost space looks like:

source

Best regards
Christian

Christian_Simonis · June 25, 2023, 4:35am

@Jiacheng_Cao, I moved your question from general category to the DL section.

Please let me know if your question is answered or if anything is open.

Best regards
Christian

Topic		Replies	Views
Week 2 Quiz Grader did wrong evaluation for 1 question Improving Deep Neural Networks: Hyperparameter tun quiz-help , week-2 , grader-feedback	2	317	May 15, 2024
Effect of beta on the contour plot of gradient descent with momentum (questions asked in quiz2) Improving Deep Neural Networks: Hyperparameter tun	2	568	December 19, 2021
DLS 2 Week 2: Gradient Descent with Momentum "simplification" Improving Deep Neural Networks: Hyperparameter tun week-2	2	41	October 1, 2024
Gradient descent with Momentum New Improving Deep Neural Networks: Hyperparameter tun	1	393	July 30, 2023
DLS Course 2,Week2,Programming Assignment(Exercise 3 and Exercise 5) Improving Deep Neural Networks: Hyperparameter tun	4	697	June 30, 2022

DL week2: Gradient Descent with Momentum

Related topics