Momentum clarification

Anbu · July 18, 2021, 4:28pm

Hi Sir,

@paulinpaloalto @bahadir @eruzanski @Carina @neurogeek @lucapug @javier @kampamocha @nramon

Here is the below statement taken from programming assignment. Can u please help to clarify what does it mean?

Statement 1: Momentum usually helps, but with given small learning rate & smaller dataset, its impact is negligible. How impact is negligible ?

Statement 2: Usually works well even with little tuning of hyperparameters (except α) . Here does it means for Adam tunning alpha not necessary ?

Anbu · July 22, 2021, 8:21am

@paulinpaloalto @bahadir @eruzanski @Carina @neurogeek @lucapug @javier @kampamocha @nramon Can u please help to clairfy

neurogeek · July 23, 2021, 2:30pm

Hi Anbu,

I’m going to make some assumptions because I’m not super sure which is the full context of these statements (ie. what lessons in particular).

From statement #1: So momentum might not be very useful or have negligible impact on small datasets, or when the learning rate is very small. I believe you can look at it this way… Momentum helps you converge faster by adding a boost to the gradient step. However if the gradient step is tiny, the boost will also still be tiny, hence the negligible in the statement.
Check this article that I find interesting: https://distill.pub/2017/momentum

About statement#2: This just refers to Adam optimization being very effective out of the box. Adam uses some clever tricks in order to choose a good step size on a per-parameter (ie. weights) and how quickly it is changing, instead of relying only on one, static learning rate such as alpha.

The paper is not super long and is pretty interesting: https://arvix.org/pdf/1412.6980.pdf

Hope that helps.

Topic		Replies	Views
Checking Intuition: Gradient Descent with Momentum Advantage Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	556	October 5, 2022
Adding correction to momentum in week 2 assignment Improving Deep Neural Networks: Hyperparameter tun week-2 , coursera-platform	1	12	July 20, 2024
GD with momentum versus ADAM Improving Deep Neural Networks: Hyperparameter tun week-2 , coursera-platform	3	168	May 8, 2024
DLS C2W3 Momentum vs Adam Beta Hyperparameters Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	515	April 11, 2023
Mini-batches and GD with Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	484	April 23, 2022

Momentum clarification

Related topics