Mini-batches and GD with Momentum

Moutasem_Akkad · April 23, 2022, 4:18pm

Hi,

Do we have to implement GD with Momentum, ADAM and RMSprop ONLY WITH mini-batches where the mini-batch != m ?

Can we implement them with the old single batch approach?

paulinpaloalto · April 23, 2022, 4:31pm

The logic is independent of the batch size, so it is not a question of correctness. The only question is whether it does any good in the full batch case. To the extent that you are using those more sophisticated optimizations to mitigate the higher stochasticity you get with smaller batches, that would seem to argue that it may not be that useful. But it’s also possible that you still get benefit because the cost surfaces are so complex in any case. The other thing to ask here is whether anyone actually ever does full batch GD any more. I don’t really know the answer there in terms of the overall industry “practice”, but there is the famous Yann Lecun quote: “Friends don’t let friends use batch sizes greater than 32”. Or words to that effect …

Topic		Replies	Views
Week 3 quiz question 8 Improving Deep Neural Networks: Hyperparameter tun	1	550	July 19, 2021
Stochastic Gradient Descent Vs ADAM Improving Deep Neural Networks: Hyperparameter tun	2	548	April 23, 2022
Week 2 Quiz question 2 Improving Deep Neural Networks: Hyperparameter tun	5	547	July 30, 2023
DLS Course 2,Week2,Programming Assignment(Exercise 3 and Exercise 5) Improving Deep Neural Networks: Hyperparameter tun	4	700	June 30, 2022
Momentum clarification Improving Deep Neural Networks: Hyperparameter tun	2	529	July 23, 2021

Mini-batches and GD with Momentum

Related topics