Stochastic Gradient Descent Vs ADAM

Moutasem_Akkad · April 23, 2022, 3:20pm

Hi,

I have found online a resource that describes ADAM as “an extended version of Stochastic Gradient Descent.” Is this true? I know we talked about Stochastic GD in the min-batch lecture and said that a mini-batch of 1 would represent it.

However, I don’t think we linked that to ADAM.

paulinpaloalto · April 23, 2022, 4:36pm

I think you are correct that those two ideas are independent. You have to be careful when you just do a google search for some DL concept: there are a lot of people writing Medium articles that sound sensible, but who don’t really have that much expertise and are just trying to establish an online corpus of work they can link to their profiles.

paulinpaloalto · April 23, 2022, 4:38pm

Or maybe the author was just trying to say that Adam works well in the SGD case. Of course that doesn’t mean it is not also useful with minibatch sizes greater than 1. Meaning that it’s just a question of interpreting the statement correctly.

Topic		Replies	Views
Stochastic Gradient Descent Definition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	644	September 29, 2022
Course2 week 2 assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	582	August 27, 2021
Mini-batches and GD with Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	485	April 23, 2022
Stochastic Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	558	June 5, 2021
Confusion Regarding Week 2 Video - 'Understanding Mini batch Gradient Descent' Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	570	October 27, 2021

Stochastic Gradient Descent Vs ADAM

Related topics